← Back to Computation and Language cs.CL
Training language models 100× more efficiently with brain-inspired architecture
Guan Wang, Changling Liu, Chenyu Wang, Cai Zhou, Yuhao Sun, Yifei Wu, Shuai Zhen, Luca Scimeca, Yasin Abbasi Yadkori
May 20, 2026
Standard language model pretraining demands massive compute and internet-scale data, locking foundational research behind expensive infrastructure. HRM-Text replaces the Transformer's flat attention with a hierarchical recurrent architecture that separates fast execution from slow strategic planning—mimicking how brains process information across timescales. Trained only on instruction-response pairs with a custom objective, a 1B-parameter model achieves competitive performance on MMLU, ARC-C, and GSM8K using 100–900× fewer tokens and 96–432× less compute than standard 2–7B baselines. The work proves that thoughtful codesign of architecture and training objective can dramatically close the efficiency gap.
Read the original paper →