← Back to Robotics
cs.RO

What should robots think about while acting?

Yueh-Hua Wu, Tatsuya Matsushima, Kei Ota

May 29, 2026

Vision-language-action models inherit text reasoning from language AI, but text operates at task level while robots need sub-second decisions. This work proposes reasoning as continuous latent vectors—a shareable internal medium verified through a self-checking training objective. The approach boosts robustness by 40% on real robot manipulation tasks and generalizes across robot instances, suggesting effective robot reasoning is less about language tokens than about aligned internal structure.
Published as Continuous Reasoning for Vision-Language-Action arXiv:2606.00229
Read the original paper →