A learning system controls plasma shape despite sensor failures

D. Sorokin, M. Stokolesov, A. Granovskiy, I. Prokofyev, E. Adishchev, M. Nurgaliev, E. Khayrutdinov, G. Subbotin, R. Clark, D. Orlov

Tokamak plasma shape control typically requires equilibrium reconstruction followed by fixed linear controllers, and breaks when sensors fail. This work trains a reinforcement learning agent in high-fidelity simulation on 120 experimental DIII-D plasma shapes, exposing it to random shape targets updated every 0.25 s to cover diverse transitions. The agent achieves 2.01 cm mean shape error on held-out configurations and robustly handles 30% random sensor dropout without switching logic or backup controllers. An asymmetric actor-critic architecture uses privileged equilibrium information to improve estimates under partial observability, while an auxiliary reconstruction head enables interpretability. The policy transfers zero-shot to experimental DIII-D shots and an independent simulator, directly commanding coil actuators on dynamic shape maneuvers.