Recognizing actions from any camera angle without retraining

Action recognition systems fail when camera angles or body orientations differ from training data—a major problem for real-world deployment. This work combines motion cues from multiple viewpoints with text descriptions to train models that recognize both seen and unseen actions across camera angles. The approach uses an orientation-aware motion encoder and adaptive text prompts that adjust to different body positions at test time, improving performance across four major benchmarks (NTU-RGB+D, BABEL, NW-UCLA, and surveillance datasets) while outperforming recent zero-shot methods. Code and models are released.