← Back to Computer Vision
cs.CV

Narrating surgery in real time as it happens

Jingyi He, Yue Zhou, Long Bai, Kun Yuan, Nassir Navab, Yuan Bi

May 20, 2026

Existing surgical AI systems lag behind reality, analyzing video only after the fact. SurgOnAir processes frames live and generates narration tokens continuously, reacting instantly to evolving surgical dynamics. The team built a new 11k-video dataset with three levels of annotation (action, step, phase) and trained a vision-language model to produce hierarchy-aware commentary and flag workflow transitions as they occur. Real-time surgical guidance now fits in a single unified model.
Published as SurgOnAir: Hierarchy-Aware Real-Time Surgical Video Commentary arXiv:2605.21132
Read the original paper →