← Back to Computation and Language cs.CL
Speeding up AI that searches the web multiple times
Mehrdad Saberi, Keivan Rezaei, Soheil Feizi
May 21, 2026
Language models solving complex questions often need to retrieve information multiple times—each lookup forces the model to wait. SpecHop maintains multiple speculative retrieval threads in parallel, verifying predicted results against actual tool outputs asynchronously and rolling back incorrect branches. The method preserves accuracy while cutting latency up to 40%, closely matching theoretical predictions about optimal speedup.
Read the original paper →