← Back to Computation and Language
cs.CL

Running functions in parallel while LLMs decode

Guangyu Feng, Huanzhi Mao, Prabal Dutta, Joseph E. Gonzalez

May 14, 2026

Modern LLM agents call external functions to solve tasks, but synchronous execution blocks the model from generating new tokens until each function returns—creating latency bottlenecks. AsyncFC decouples decoding from execution at the runtime layer, allowing the model to continue generating while functions run in parallel. The approach requires no modifications to model weights, training, or existing function code—it wraps the standard function-calling interface. Experiments on function-calling benchmarks show AsyncFC reduces end-to-end completion time while maintaining accuracy, and reveal that LLMs can naturally reason about symbolic representations of pending results without explicit training.
Published as Concurrency without Model Changes: Future-based Asynchronous Function Calling for LLMs arXiv:2605.15077
Read the original paper →