← Back to Machine Learning
cs.LG

What inference hardware leaks about your LLM

Anna Wimbauer, Jonas Möller, Erik Imgrund, Konrad Rieck

May 28, 2026

Small numerical deviations from different GPUs, inference engines, and attention implementations accumulate and alter LLM outputs in measurable ways. Researchers built a fingerprinting method that queries a model and infers its hardware platform, inference engine, and attention backend—even at non-zero temperature where outputs should vary. This matters for privacy and security: anyone querying a black-box LLM can now unmask its infrastructure. Eliminating these traces requires harmonizing hardware and software across entire stacks, which the authors show is fundamentally impractical.
Published as Fingerprinting Inference Systems of Large Language Models arXiv:2605.29979
Read the original paper →