← Back to Computer Vision cs.CV
How to let AI agents sketch before painting?
Junyan Ye, Jun He, Zilong Huang, Dongzhi Jiang, Xuan Yang, Rui Chen, Weijia Li
May 28, 2026
Current image-generation agents are trapped in a loop of rewriting prompts to refine outputs, with no way to directly manipulate the result. GenClaw introduces a three-stage workflow: the agent reasons about the concept, then writes code (SVG, HTML, Three.js) to create a structural sketch, then uses image generation to add textures and realism. Code acts as a controllable bridge between language reasoning and pixels, giving the agent something closer to a real artist's toolkit—conceptualize, sketch, paint.
Read the original paper →