← Back to Robotics cs.RO
What should robots think about while acting?
Yueh-Hua Wu, Tatsuya Matsushima, Kei Ota
May 29, 2026
Vision-language-action models inherit text reasoning from language AI, but text operates at task level while robots need sub-second decisions. This work proposes reasoning as continuous latent vectors—a shareable internal medium verified through a self-checking training objective. The approach boosts robustness by 40% on real robot manipulation tasks and generalizes across robot instances, suggesting effective robot reasoning is less about language tokens than about aligned internal structure.
Read the original paper →