← Back to Computation and Language
cs.CL

Speeding up AI that searches the web multiple times

Mehrdad Saberi, Keivan Rezaei, Soheil Feizi

May 21, 2026

Language models solving complex questions often need to retrieve information multiple times—each lookup forces the model to wait. SpecHop maintains multiple speculative retrieval threads in parallel, verifying predicted results against actual tool outputs asynchronously and rolling back incorrect branches. The method preserves accuracy while cutting latency up to 40%, closely matching theoretical predictions about optimal speedup.
Published as SpecHop: Continuous Speculation for Accelerating Multi-Hop Retrieval Agents arXiv:2605.21965
Read the original paper →