← Back to Computer Vision
cs.CV

Can AI find the perfect camera angle just from a description?

Jiarui Guo, Haojia Wei, Yiming Zhang, Yifei Liu, Yuning Gong, Hongjie Zhang, Xue Yang, Zhihang Zhong

May 22, 2026

Virtual photography requires an agent to understand a 3D scene, interpret a language description of the desired shot, and find the right camera angle—combining spatial reasoning with aesthetic judgment. PhotoFlow uses three coordinated modules: a Director that proposes candidate camera positions, a Reviewer that scores them using visual critique and rules, and a Reflector that learns from failures. On a new 47-scene benchmark with 141 photography missions, PhotoFlow outperforms one-shot prediction, reflection-based search, and random sampling in both output quality and success rate within a fixed rendering budget.
Published as PhotoFlow: Agentic 3D Virtual Photography Missions arXiv:2605.23771
Read the original paper →