← Back to Robotics
cs.RO

How to make diffusion models plan farther ahead without exploding compute costs

Byoungwoo Park, Utkarsh A. Mishra, Jaemoo Choi, Juho Lee, Yongxin Chen

May 30, 2026

Diffusion models excel at generating short sequences, but extending them to long-horizon tasks breaks coherence—neighboring plans stay locally consistent yet form implausible global trajectories. CoFi separates this into two stages: first building a coarse structural scaffold capturing task-level arrangement, then refining details while preserving that scaffold. Across robotic manipulation, panoramic images, and long videos, it improves both global structure and sample quality while cutting denoiser calls by 2–8×.
Published as Coarse-to-Fine Compositional Diffusion for Long-Horizon Planning arXiv:2606.00837
Read the original paper →