Narrating surgery in real time as it happens

Existing surgical AI systems lag behind reality, analyzing video only after the fact. SurgOnAir processes frames live and generates narration tokens continuously, reacting instantly to evolving surgical dynamics. The team built a new 11k-video dataset with three levels of annotation (action, step, phase) and trained a vision-language model to produce hierarchy-aware commentary and flag workflow transitions as they occur. Real-time surgical guidance now fits in a single unified model.