← Back to Computer Vision
cs.CV

Tracking clothing wrinkles and body movements from video alone

Zhanbo Huang, Xiaoming Liu, Yu Kong

May 21, 2026

Existing human pose models capture skeleton movement but miss clothing deformation; generic scene flow fails on articulated bodies. H-Flow predicts dense pixel-level motion of both skeletal pose and surface deformation from monocular video using physics-inspired losses—geometric, structural, and biomechanical constraints—instead of expensive ground-truth labels. The team also released DynAct4D, a synthetic benchmark with dense flow annotations. Results beat both scene-flow and parametric baselines and generalize to unconstrained video without retraining.
Published as H-Flow: Self-supervised Human Scene Flow via Physics-inspired Joint Multi-modal Learning arXiv:2605.22629
Read the original paper →