Splitting physics simulations across expert networks prevents training interference

Co-training neural operators on physically incompatible PDE regimes — such as open-channel fluid flow and porous-media flow — causes gradient conflicts that degrade accuracy across all domains. Shodh-MoE addresses this with a sparse-activated latent transformer that routes compressed physical representations to specialized expert subnetworks via a Top-1 semantic router, while shared experts handle universal symmetries. During a 20,000-step pretraining run, routing telemetry revealed spontaneous domain separation: fluid tokens routed exclusively to one expert, porous-media tokens to another, with no explicit supervision. Decoded physical MSEs reached 2.48×10⁻⁶ and 1.76×10⁻⁶ on the respective domains, with strict mass conservation enforced via a Helmholtz velocity parameterization.