Teaching AI the rules of human language structure

Large language models learn less efficiently from data than humans do. This work proposes pre-pretraining LLMs on MP-STRUCT, a formal language that encodes the core operations of human grammar (hierarchical composition, feature agreement, and long-distance dependencies) inspired by the Language Acquisition Device hypothesis. After just 500 steps of pre-training on MP-STRUCT, models match or exceed prior formal-language baselines in token efficiency and develop a distinctly human trait: rejecting structurally implausible languages like REVERSE. The analysis reveals that effective pre-training depends not on raw expressivity alone but on how clearly dependency relationships are marked—a finding that contradicts previous assumptions about what makes formal languages useful for bootstrapping natural language learning.