← Back to Artificial Intelligence
cs.AI

Can you extract hidden reasoning from language models through clever prompting?

Yu-An Lu, Ci-Yang Tsai, Yu-Lin Tsai, Raluca Ada Popa, Chia-Mu Yu

May 30, 2026

Language model companies hide internal reasoning traces to prevent unauthorized capability extraction. This work shows users can recover useful reasoning signals through Reasoning Exposure Prompting (REP)—a technique using code-like formatted examples to elicit hidden traces. Tests across multiple models and datasets confirm REP substantially increases similarity to internal reasoning while preserving distillable signals, suggesting interface-level hiding is insufficient protection.
Published as Hidden Thoughts Are Not Secret: Reasoning Trace Exposure in LLMs arXiv:2606.00642
Read the original paper →