Can one photo become a photorealistic 3D person in under a second?

Hezhen Hu, Wangbo Zhao, Lanqing Guo, Hanwen Jiang, Jonathan C. Liu, Zhiwen Fan, Kai Wang, Zhangyang Wang, Georgios Pavlakos

HumanNOVA reconstructs photorealistic 3D human avatars from a single image without test-time optimization. The key insight: scale synthetic training data by animating rigged assets in realistic poses and re-rendering multi-camera captures. The feed-forward model encodes the image and rough body mesh (SMPL) into tokens, fuses them via cross-attention, and outputs a 3D triplane representation—fast enough for real-time use. Outperforms prior work on standard benchmarks while handling diverse lighting and poses.