Can reinforcement learning fix video generation's camera control problem?

Video generation models struggle to follow precise camera movements and maintain physical scale when given new instructions. Geo-Align uses reinforcement learning with a geometry-aware reward system that measures 3D camera trajectories directly from generated frames, penalizing rotation and translation errors. Trained on unpaired real and synthetic videos, it outperforms supervised baselines on camera controllability and visual quality.