← Back to Artificial Intelligence
cs.AI

Can you train RNNs without unrolling them through time?

Akarsh Kumar, Phillip Isola

June 4, 2026

Training RNNs on long sequences hits a wall: backpropagation through time is sequential and gradients vanish or explode. This work proposes Supervised Memory Training, which sidesteps the problem entirely by treating RNN training as supervised learning on one-step memory updates. A Transformer encoder learns what information to preserve across timesteps, then an RNN learns how to update that memory—all trainable in parallel with stable gradients. On language modeling and pixel prediction, SMT outperforms standard BPTT, suggesting RNNs could scale to capture long-range dependencies without the parallelism bottleneck.
Published as Pretraining Recurrent Networks without Recurrence arXiv:2606.06479
Read the original paper →