← Back to Machine Learning
cs.LG

Learning interpretable robot policies without losing performance in discretization

Chengpeng Hu, Yingqian Zhang, Hendrik Baier

May 18, 2026

Programmatic reinforcement learning represents policies as human-readable programs, but gradient-based training methods rely on continuous approximations that lose performance when converted back to discrete code. DiPRL addresses this by adding architecture entropy regularization during training, which naturally encourages convergence toward discrete programs without requiring post-hoc fine-tuning. The approach maintains gradient-based optimization efficiency while preserving policy expressivity across discrete and continuous RL tasks, producing interpretable policies competitive with standard deep RL methods.
Published as DiPRL: Learning Discrete Programmatic Policies via Architecture Entropy Regularization arXiv:2605.18508
Read the original paper →