← Back to Computation and Language
cs.CL

Routing queries to the right AI model without retraining

Aashna Garg, Siddharth Singha Roy, Jinu Jang, Federico Brancasi, Shengyu Fu

May 16, 2026

Production LLM systems often maintain pools of models with vastly different costs and capabilities. HyDRA uses a ModernBERT encoder to score each query across four dimensions—reasoning, code generation, debugging, and tool use—then selects the cheapest model meeting those predicted needs. Unlike prior routers that require retraining when models are added or removed, HyDRA's predictor runs decoupled from the catalog, needing only configuration changes. On SWE-Bench Verified, it achieves 54.1% cost savings versus always using Claude Sonnet while matching its quality, and 6× better savings than the prior binary router. The system is deployed in GitHub Copilot's VS Code Chat and handles language-invariant routing across CJK, European, and other scripts.
Published as HyDRA: Hybrid Dynamic Routing Architecture for Heterogeneous LLM Pools arXiv:2605.17106
Read the original paper →