← Back to Computer Vision cs.CV
Can one photo become a photorealistic 3D person in under a second?
Hezhen Hu, Wangbo Zhao, Lanqing Guo, Hanwen Jiang, Jonathan C. Liu, Zhiwen Fan, Kai Wang, Zhangyang Wang, Georgios Pavlakos
June 1, 2026
HumanNOVA reconstructs photorealistic 3D human avatars from a single image without test-time optimization. The key insight: scale synthetic training data by animating rigged assets in realistic poses and re-rendering multi-camera captures. The feed-forward model encodes the image and rough body mesh (SMPL) into tokens, fuses them via cross-attention, and outputs a 3D triplane representation—fast enough for real-time use. Outperforms prior work on standard benchmarks while handling diverse lighting and poses.
Read the original paper →