Can robots imagine failure modes to improve safety?

Junwon Seo, Sushant Veer, Ran Tian, Wenhao Ding, Apoorva Sharma, Karen Leung, Edward Schmerling, Marco Pavone, Andrea Bajcsy

Video world models can simulate future outcomes of robot actions, but they typically show nominal cases and miss rare failures critical for safety. StressDream optimizes the noise in diffusion-based world models to steer predictions toward specified high-impact scenarios—like task failures—while keeping imaginations realistic. Using a vision-language model to guide toward target events and a plausibility constraint to avoid generating nonsense, the method enables safer policy evaluation on autonomous driving and robotic manipulation tasks.