A simulator for predicting large-scale LLM training and inference performance

Deploying large-scale LLM training and inference requires navigating a complex space of parallelism strategies, system optimizations, and hardware choices. Charon is a modular simulator that predicts performance across these configurations with high accuracy: under 5.35% prediction error overall, and 3.74% for training on large GPU clusters. In a practical inference case, the simulator identified a configuration that outperformed an engineering-tuned baseline, demonstrating utility for practitioners optimizing real systems.