← Back to Computer Vision
cs.CV

Do vision and language need different expert pathways?

Zijie Zhou, Dandan Zhu, Hangxiangpan Wang, Heng Zhang, Huishen Jiao, Yi Zhao

May 29, 2026

Vision-language models typically route text and images through symmetric expert pathways, but this ignores a fundamental asymmetry: text describes *parts* of images, not parallel concepts. This paper proposes AsyMoE, which uses hyperbolic geometry to capture hierarchical relationships between modalities and forces language experts to stay grounded in visual evidence rather than relying on learned parameters. The result: 1.5% gains over existing mixture-of-experts variants, with dramatic improvements on hallucination-prone benchmarks.
Published as Hyperbolic and Evidence-Prioritized Experts for Large Vision-Language Models arXiv:2606.00275
Read the original paper →