← Back to Computer Vision
cs.CV

How to add new objects to video game worlds without retraining

Kiymet Akdemir, Pinar Yanardag

June 1, 2026

Current video game-like world models generate unseen regions using only their base priors, leaving users unable to control what appears beyond the initial frame. SPAWN, a training-free method, inserts user-specified concepts (characters, buildings, props) by briefly swapping the model's foundational memory anchor during generation, then returning it—allowing the concept to propagate naturally through the rest of the video. Accepts image or text input; maintains consistent lighting, scale, perspective, and temporal coherence, making controllable scene composition possible in existing models without retraining.
Published as From Zero to Hero: Training-Free Custom Concept Spawning in World Models arXiv:2606.02575
Read the original paper →