Detecting what changed in 3D scenes without retraining

3D change detection from multi-temporal images requires precise alignment of reconstructions across different epochs—a task complicated by scale ambiguity, depth noise, and the paradox that scene changes themselves corrupt registration. VGGT-CD solves this by decoupling cross-temporal registration from dynamic-change interference using a two-stage pipeline: coarse-stage keyframe joint inference establishes a shared metric space and Sim(3) prior; fine-stage purification isolates static backgrounds and refines alignment via closed-form centroid optimization. Tested on 11 scenes from the World Across Time dataset, the method reduces Absolute Trajectory Error by 44% outdoors and 59% indoors while completing registration 6× faster, with no task-specific training required.