← Back to Computer Vision
cs.CV

Remembering virtual worlds without breaking real-time speed

Jung Yi, Minjae Kim, Paul Hyunbin Cho, Wooseok Jang, Sangdoo Yun, Seungryong Kim

May 21, 2026

Autoregressive video diffusion models can generate interactive worlds in real-time, but they face a hard tradeoff: keeping perfect memory of past scenes kills frame rate, while fast inference forgets the world. WorldKV solves this by storing discarded memory chunks on GPU/CPU and selectively retrieving them based on camera position and action, while pruning redundant tokens within chunks. On two benchmarks, it matches full-memory consistency at double the speed with no fine-tuning required.
Published as WorldKV: Efficient World Memory with World Retrieval and Compression arXiv:2605.22718
Read the original paper →