Expanding language models to new tongues without costly retraining

Hao Zhou, Tianhao Li, Zhijun Wang, Shuaijie She, Linjuan Wu, Hao-Ran Wei, Baosong Yang, Jiajun Chen, Shujian Huang

Extending LLMs to new languages typically requires expensive continued pre-training and alignment phases. This work resolves the core tension in parameter merging—where reducing conflicts with the original model weakens new language learning—by converting a dense model into a Mixture-of-Experts architecture with language-specific experts. The method transfers alignment ability by merging a post-training parameter delta into the CPT-enhanced base, skipping full alignment. Experiments show performance gains on new languages while maintaining original capabilities, with the approach generalizing across different models and post-training deltas.