← Back to Computer Vision cs.CV
Teaching image models to draw bigger pictures than they learned
Javad Rajabi, Kimia Shaban, Koorosh Roohi, David B. Lindell, Babak Taati
May 21, 2026
Diffusion transformers generate worse images when asked to scale beyond their training resolution—a hard ceiling that limits practical use. SEGA adapts attention scaling dynamically based on the spatial frequencies present in each image, rather than applying uniform scaling. Testing on Flux and other modern models shows consistent gains in both structure and detail at higher resolutions, with no retraining required.
Read the original paper →