Teaching billion-parameter models new domains without forgetting old ones

Continually training LLMs on new data domains typically degrades performance on previously learned ones — a problem called catastrophic forgetting — and existing fixes require storing old data, knowing task boundaries, or expensive regularization that breaks at scale. TFGP introduces a transformer overlay that routes parameter updates so new domain learning writes only to orthogonal subspaces, leaving prior-domain representations intact. Across six domains (prose, Python, math, biomedical, Chinese, JavaScript) at 1B tokens per phase, the method achieves ≥99.59% gradient separation between domains and a 26.8% drop in held-out JavaScript perplexity from Python training alone — positive forward transfer with no replay. Tested at ~398M, ~739M, and ~9B parameter scales in both from-scratch and retrofit settings; code availability is not mentioned.