← Back to Computer Vision
cs.CV

How to let AI agents sketch before painting?

Junyan Ye, Jun He, Zilong Huang, Dongzhi Jiang, Xuan Yang, Rui Chen, Weijia Li

May 28, 2026

Current image-generation agents are trapped in a loop of rewriting prompts to refine outputs, with no way to directly manipulate the result. GenClaw introduces a three-stage workflow: the agent reasons about the concept, then writes code (SVG, HTML, Three.js) to create a structural sketch, then uses image generation to add textures and realism. Code acts as a controllable bridge between language reasoning and pixels, giving the agent something closer to a real artist's toolkit—conceptualize, sketch, paint.
Published as GenClaw: Code-Driven Agentic Image Generation arXiv:2605.30248
Read the original paper →