← Back to Computer Vision
cs.CV

Can one photo become a photorealistic 3D person in under a second?

Hezhen Hu, Wangbo Zhao, Lanqing Guo, Hanwen Jiang, Jonathan C. Liu, Zhiwen Fan, Kai Wang, Zhangyang Wang, Georgios Pavlakos

June 1, 2026

HumanNOVA reconstructs photorealistic 3D human avatars from a single image without test-time optimization. The key insight: scale synthetic training data by animating rigged assets in realistic poses and re-rendering multi-camera captures. The feed-forward model encodes the image and rough body mesh (SMPL) into tokens, fuses them via cross-attention, and outputs a 3D triplane representation—fast enough for real-time use. Outperforms prior work on standard benchmarks while handling diverse lighting and poses.
Published as HumanNOVA: Photorealistic, Universal and Rapid 3D Human Avatar Modeling from a Single Image arXiv:2606.02573
Read the original paper →