← Back to Computation and Language cs.CL
Running functions in parallel while LLMs decode
Guangyu Feng, Huanzhi Mao, Prabal Dutta, Joseph E. Gonzalez
May 14, 2026
Modern LLM agents call external functions to solve tasks, but synchronous execution blocks the model from generating new tokens until each function returns—creating latency bottlenecks. AsyncFC decouples decoding from execution at the runtime layer, allowing the model to continue generating while functions run in parallel. The approach requires no modifications to model weights, training, or existing function code—it wraps the standard function-calling interface. Experiments on function-calling benchmarks show AsyncFC reduces end-to-end completion time while maintaining accuracy, and reveal that LLMs can naturally reason about symbolic representations of pending results without explicit training.
Read the original paper →