Can physics models learn from real videos without perfect labels?

Most physics simulators trained on videos require perfect state information—complete point clouds, tracked particles—which real videos don't provide. This work trains a particle-based dynamics model directly from unlabeled real-world videos by using Gaussian splatting and rendering supervision: the model predicts how particles move and rotate, and learns by comparing rendered outputs to actual frames. The approach sidesteps the sim-to-real gap that has limited prior methods, demonstrated on a new dataset of 500 videos with diverse object interactions.