Do vision and language need different expert pathways?

Vision-language models typically route text and images through symmetric expert pathways, but this ignores a fundamental asymmetry: text describes *parts* of images, not parallel concepts. This paper proposes AsyMoE, which uses hyperbolic geometry to capture hierarchical relationships between modalities and forces language experts to stay grounded in visual evidence rather than relying on learned parameters. The result: 1.5% gains over existing mixture-of-experts variants, with dramatic improvements on hallucination-prone benchmarks.