← Back to Artificial Intelligence cs.AI
Finding the surprising moments in long videos without training
Dahye Kim, Bhuvan Sachdeva, Karan Uppal, Naman Gupta, Vineeth N. Balasubramanian, Deepti Ghadiyaram
May 21, 2026
Most frames in long videos are redundant. Swift Sampling identifies temporally surprising moments—where visual features unexpectedly diverge from their predicted trajectory—using Taylor expansion in the visual latent space. The method adds only 0.02× computational overhead, outperforms uniform sampling across video QA benchmarks and 10 downstream tasks, with gains up to +12.5 points on long videos with limited frame budgets.
Read the original paper →