← Back to Computation and Language
cs.CL

Can language models automate quantitative trading strategy testing?

Zhensheng Wang, Wenmian Yang, Qingtai Wu, Lequan Ma, Yiquan Zhang, Weijia Jia

May 18, 2026

Quantitative backtesting—validating trading strategies against historical data—is technically complex and limits adoption. This work introduces BacktestBench, the first large-scale benchmark for automated backtesting, covering metrics calculation, ticker selection, strategy selection, and parameter tuning. The authors propose AutoBacktest, a multi-agent system that decomposes backtesting into semantic extraction, SQL retrieval, and Python code generation. Testing 23 mainstream LLMs reveals that grounded verification and standardized indicator representations are critical for end-to-end performance. The benchmark and baseline are designed for both researchers advancing LLM reasoning and quantitative finance practitioners seeking to lower barriers to strategy testing.
Published as BacktestBench: Benchmarking Large Language Models for Automated Quantitative Strategy Backtesting arXiv:2605.17937
Read the original paper →