← Back to Artificial Intelligence
cs.AI

Can LLMs learn to think about their own thinking?

Sirui Chen, Lei Xu, Yuying Zhao, Yutian Chen, Yu Wang, Beier Zhu, Hanwang Zhang, Shengjie Zhao, Chaochao Lu

May 22, 2026

Existing methods reward language models either for final answers or by hand-crafting task-specific quality rubrics—both limiting. This work introduces MaR, which treats reasoning itself as trainable through metacognition: models learn to identify task-relevant information and adjust their problem-solving strategy mid-process, not just chase correct answers. Tested on 22 benchmarks, MaR lifts Qwen3.5-9B to compete with much larger proprietary models and generalizes to unseen tasks.
Published as Metacognition as Reward: Reinforcing LLM Reasoning via Knowledge and Regulation Signals arXiv:2605.23384
Read the original paper →