← Back to Machine Learning cs.LG
What inference hardware leaks about your LLM
Anna Wimbauer, Jonas Möller, Erik Imgrund, Konrad Rieck
May 28, 2026
Small numerical deviations from different GPUs, inference engines, and attention implementations accumulate and alter LLM outputs in measurable ways. Researchers built a fingerprinting method that queries a model and infers its hardware platform, inference engine, and attention backend—even at non-zero temperature where outputs should vary. This matters for privacy and security: anyone querying a black-box LLM can now unmask its infrastructure. Eliminating these traces requires harmonizing hardware and software across entire stacks, which the authors show is fundamentally impractical.
Read the original paper →