← Back to Computation and Language cs.CL
Can language models automate quantitative trading strategy testing?
Zhensheng Wang, Wenmian Yang, Qingtai Wu, Lequan Ma, Yiquan Zhang, Weijia Jia
May 18, 2026
Quantitative backtesting—validating trading strategies against historical data—is technically complex and limits adoption. This work introduces BacktestBench, the first large-scale benchmark for automated backtesting, covering metrics calculation, ticker selection, strategy selection, and parameter tuning. The authors propose AutoBacktest, a multi-agent system that decomposes backtesting into semantic extraction, SQL retrieval, and Python code generation. Testing 23 mainstream LLMs reveals that grounded verification and standardized indicator representations are critical for end-to-end performance. The benchmark and baseline are designed for both researchers advancing LLM reasoning and quantitative finance practitioners seeking to lower barriers to strategy testing.
Read the original paper →