← Back to Computation and Language
cs.CL

Training language models 100× more efficiently with brain-inspired architecture

Guan Wang, Changling Liu, Chenyu Wang, Cai Zhou, Yuhao Sun, Yifei Wu, Shuai Zhen, Luca Scimeca, Yasin Abbasi Yadkori

May 20, 2026

Standard language model pretraining demands massive compute and internet-scale data, locking foundational research behind expensive infrastructure. HRM-Text replaces the Transformer's flat attention with a hierarchical recurrent architecture that separates fast execution from slow strategic planning—mimicking how brains process information across timescales. Trained only on instruction-response pairs with a custom objective, a 1B-parameter model achieves competitive performance on MMLU, ARC-C, and GSM8K using 100–900× fewer tokens and 96–432× less compute than standard 2–7B baselines. The work proves that thoughtful codesign of architecture and training objective can dramatically close the efficiency gap.
Published as HRM-Text: Efficient Pretraining Beyond Scaling arXiv:2605.20613
Read the original paper →