← Back to Computer Vision
cs.CV

How to make talking faces without training anything new

Hao Wu, Xiangyang Luo, Hao Wang, Jiawei Zhang, Yi Zhang, Jinwei Wang

May 28, 2026

Talking face generation normally demands task-specific training on massive datasets. This work skips that entirely by repurposing Stable Diffusion and IP-Adapter with three lightweight, parameter-free modules that handle lip sync, identity consistency, and temporal smoothing. Results beat existing methods on both accuracy and visual quality without touching pretrained weights.
Published as IP-Adapter Is All You Need: Towards Fine-Tuning-Free Diffusion-Based Talking Face Generation arXiv:2605.30230
Read the original paper →