← Back to Machine Learning cs.LG
Learning interpretable robot policies without losing performance in discretization
Chengpeng Hu, Yingqian Zhang, Hendrik Baier
May 18, 2026
Programmatic reinforcement learning represents policies as human-readable programs, but gradient-based training methods rely on continuous approximations that lose performance when converted back to discrete code. DiPRL addresses this by adding architecture entropy regularization during training, which naturally encourages convergence toward discrete programs without requiring post-hoc fine-tuning. The approach maintains gradient-based optimization efficiency while preserving policy expressivity across discrete and continuous RL tasks, producing interpretable policies competitive with standard deep RL methods.
Read the original paper →