What should robots think about while acting?

Vision-language-action models inherit text reasoning from language AI, but text operates at task level while robots need sub-second decisions. This work proposes reasoning as continuous latent vectors—a shareable internal medium verified through a self-checking training objective. The approach boosts robustness by 40% on real robot manipulation tasks and generalizes across robot instances, suggesting effective robot reasoning is less about language tokens than about aligned internal structure.