← Back to Computer Vision
cs.CV

Can reinforcement learning fix video generation's camera control problem?

Zizun Li, Haoyu Guo, Runzhe Teng, Chunhua Shen, Tong He

May 22, 2026

Video generation models struggle to follow precise camera movements and maintain physical scale when given new instructions. Geo-Align uses reinforcement learning with a geometry-aware reward system that measures 3D camera trajectories directly from generated frames, penalizing rotation and translation errors. Trained on unpaired real and synthetic videos, it outperforms supervised baselines on camera controllability and visual quality.
Published as Geo-Align: Video Generation Alignment via Metric Geometry Reward arXiv:2605.23903
Read the original paper →