Can pretrained models answer 'what if' questions about images?

Causal generative modeling lets AI systems reason about counterfactuals—answering "what if" questions. Most existing methods bake causal constraints into training, requiring retraining for new scenarios. FM-CGM instead leverages pretrained foundation models off-the-shelf: a reasoning model for causal inference, a text-to-image diffusion model for generation. A new attention mechanism (Causal Semantic Guidance) ensures edits to one concept properly cascade to dependent ones while leaving unrelated regions unchanged. Tested on visual reasoning tasks, it identifies plausible causal structures and generates faithful counterfactual images without model retraining.