Forgetting on command: unlearning for parallel text generation

Masked diffusion language models like LLaDA generate text by denoising all positions in parallel rather than sequentially, but nobody has figured out how to make them forget specific information. This paper proposes Masked Diffusion Unlearning (MDU), which pushes the model back toward its unconditional state while keeping other knowledge intact. Testing shows MDU matches or beats existing unlearning methods on standard benchmarks. Code is released.