← Back to Computer Vision
cs.CV

Teaching image models to draw bigger pictures than they learned

Javad Rajabi, Kimia Shaban, Koorosh Roohi, David B. Lindell, Babak Taati

May 21, 2026

Diffusion transformers generate worse images when asked to scale beyond their training resolution—a hard ceiling that limits practical use. SEGA adapts attention scaling dynamically based on the spatial frequencies present in each image, rather than applying uniform scaling. Testing on Flux and other modern models shows consistent gains in both structure and detail at higher resolutions, with no retraining required.
Published as SEGA: Spectral-Energy Guided Attention for Resolution Extrapolation in Diffusion Transformers arXiv:2605.22668
Read the original paper →