← Back to Computer Vision cs.CV
Teaching AI which objects go on top when drawing overlapping scenes
Ziye Li, Henghui Ding
May 20, 2026
Existing image generation models fail when bounding boxes overlap—they don't know which object should appear in front. OcclusionFormer solves this by explicitly modeling Z-order (layering priority) using a Diffusion Transformer that separates instances and composites them like stacked layers. The authors built SA-Z, a dataset with pixel-level occlusion annotations, and added a queried alignment loss to lock each object in place. Result: clean, physically plausible overlaps instead of blurred textures.
Read the original paper →