← Back to Computer Vision cs.CV
How to add new objects to video game worlds without retraining
Kiymet Akdemir, Pinar Yanardag
June 1, 2026
Current video game-like world models generate unseen regions using only their base priors, leaving users unable to control what appears beyond the initial frame. SPAWN, a training-free method, inserts user-specified concepts (characters, buildings, props) by briefly swapping the model's foundational memory anchor during generation, then returning it—allowing the concept to propagate naturally through the rest of the video. Accepts image or text input; maintains consistent lighting, scale, perspective, and temporal coherence, making controllable scene composition possible in existing models without retraining.
Read the original paper →