← Back to Computer Vision cs.CV
Narrating surgery in real time as it happens
Jingyi He, Yue Zhou, Long Bai, Kun Yuan, Nassir Navab, Yuan Bi
May 20, 2026
Existing surgical AI systems lag behind reality, analyzing video only after the fact. SurgOnAir processes frames live and generates narration tokens continuously, reacting instantly to evolving surgical dynamics. The team built a new 11k-video dataset with three levels of annotation (action, step, phase) and trained a vision-language model to produce hierarchy-aware commentary and flag workflow transitions as they occur. Real-time surgical guidance now fits in a single unified model.
Read the original paper →