Can LLMs learn to think about their own thinking?

Sirui Chen, Lei Xu, Yuying Zhao, Yutian Chen, Yu Wang, Beier Zhu, Hanwang Zhang, Shengjie Zhao, Chaochao Lu

Existing methods reward language models either for final answers or by hand-crafting task-specific quality rubrics—both limiting. This work introduces MaR, which treats reasoning itself as trainable through metacognition: models learn to identify task-relevant information and adjust their problem-solving strategy mid-process, not just chase correct answers. Tested on 22 benchmarks, MaR lifts Qwen3.5-9B to compete with much larger proprietary models and generalizes to unseen tasks.