← Back to Artificial Intelligence
cs.AI

Generative policies that learn multimodal actions in one step

Zeyuan Wang, Da Li, Yulin Chen, Yuehu Gong, Yanming Guo, Ye Shi, Liang Bai, Tianyuan Yu, Yanwei Fu

May 20, 2026

Standard RL policies face a tradeoff: Gaussian policies are fast but struggle with multimodal action distributions, while generative policies handle complex behaviors but require iterative sampling. This work proposes Stochastic MeanFlow Policies (SMFP), which use a learned transformation of Gaussian noise to generate expressive, multimodal actions—retaining tractable entropy and one-step efficiency. Trained via mirror descent with entropy regularization, SMFP improves performance over both conventional and generative baselines across seven MuJoCo benchmarks while maintaining fast inference.
Published as \textit{Stochastic} MeanFlow Policies: One-Step Generative Control with Entropic Mirror Descent arXiv:2605.21282
Read the original paper →