← Back to Computer Vision
cs.CV

Fast 3D models from images and text, ready in seconds

Jiahao Li

May 18, 2026

Creating 3D assets traditionally requires manual modeling or expensive scanning. This work tackles both automatic generation and reconstruction. Instant3D combines multi-view diffusion with feed-forward sparse-view reconstruction to synthesize high-quality 3D assets from text or images in 5–20 seconds. FastMap reimplements structure-from-motion using first-order optimization and fused GPU kernels to achieve 10× speedup over state-of-the-art while preserving pose accuracy and novel view synthesis quality. Applications include rapid prototyping for games, VR, robotics, and efficient 3D data collection for training foundation models.
Published as Efficient 3D Content Reconstruction and Generation arXiv:2605.18052
Read the original paper →