← Back to Artificial Intelligence
cs.AI

Finding the surprising moments in long videos without training

Dahye Kim, Bhuvan Sachdeva, Karan Uppal, Naman Gupta, Vineeth N. Balasubramanian, Deepti Ghadiyaram

May 21, 2026

Most frames in long videos are redundant. Swift Sampling identifies temporally surprising moments—where visual features unexpectedly diverge from their predicted trajectory—using Taylor expansion in the visual latent space. The method adds only 0.02× computational overhead, outperforms uniform sampling across video QA benchmarks and 10 downstream tasks, with gains up to +12.5 points on long videos with limited frame budgets.
Published as Swift Sampling: Selecting Temporal Surprises via Taylor Series arXiv:2605.22678
Read the original paper →