Teaching image models to draw bigger pictures than they learned

Diffusion transformers generate worse images when asked to scale beyond their training resolution—a hard ceiling that limits practical use. SEGA adapts attention scaling dynamically based on the spatial frequencies present in each image, rather than applying uniform scaling. Testing on Flux and other modern models shows consistent gains in both structure and detail at higher resolutions, with no retraining required.