Teaching AI to edit text in photos without breaking everything else

Yiheng Lin, Siyu Jiao, Xiaohan Lan, Wei Zhou, Qi She, Fei Yu, Heyun Chen, Zhengwei Wang, Jinghuan Chen, Moran Li, Yingchen Yu, Zijian Feng, Yao Zhao, Yunchao Wei, Yujie Zhong

Scene text editing—changing words in photos while keeping images realistic—is hard because models must preserve backgrounds and non-target areas. TextSculptor provides a 3.2M-sample dataset built from automated synthesis and real compositing, plus TextSculpt-Bench, a standardized test covering text addition, replacement, removal, and mixed edits. The framework includes OCR-based evaluation to measure both text accuracy and visual quality. Results show open-source models can now match proprietary systems.