Can language models think without writing their thoughts?

Large language models typically show their reasoning by generating intermediate steps token-by-token—expensive and couples thinking to output. This work introduces Reasoning in Memory (RiM), which replaces that autoregressive chain-of-thought with fixed special tokens that function as working memory, processed in a single forward pass. Trained via curriculum learning on math and logic benchmarks, RiM matches or beats existing latent reasoning methods while reducing compute overhead.