Routing queries to the right AI model without retraining

Production LLM systems often maintain pools of models with vastly different costs and capabilities. HyDRA uses a ModernBERT encoder to score each query across four dimensions—reasoning, code generation, debugging, and tool use—then selects the cheapest model meeting those predicted needs. Unlike prior routers that require retraining when models are added or removed, HyDRA's predictor runs decoupled from the catalog, needing only configuration changes. On SWE-Bench Verified, it achieves 54.1% cost savings versus always using Claude Sonnet while matching its quality, and 6× better savings than the prior binary router. The system is deployed in GitHub Copilot's VS Code Chat and handles language-invariant routing across CJK, European, and other scripts.