← Back to Computer Vision cs.CV
Do vision and language need different expert pathways?
Zijie Zhou, Dandan Zhu, Hangxiangpan Wang, Heng Zhang, Huishen Jiao, Yi Zhao
May 29, 2026
Vision-language models typically route text and images through symmetric expert pathways, but this ignores a fundamental asymmetry: text describes *parts* of images, not parallel concepts. This paper proposes AsyMoE, which uses hyperbolic geometry to capture hierarchical relationships between modalities and forces language experts to stay grounded in visual evidence rather than relying on learned parameters. The result: 1.5% gains over existing mixture-of-experts variants, with dramatic improvements on hallucination-prone benchmarks.
Read the original paper →