← Back to Computation and Language cs.CL
Why optimized prompts fail to transfer between tasks
Shuzhi Gong, Hechuan Wen
May 26, 2026
Automated prompt optimization methods like DSpy and TextGrad improve LLM performance on individual benchmarks, but those improvements rarely transfer to new tasks or models. By analyzing thousands of optimized prompts across frameworks and LLMs, researchers found the problem: certain edits (like adding complexity or meta-instructions) systematically harm math and multi-hop reasoning, while step-by-step prompts help logical tasks. These aren't random failures—they're predictable mismatches between edit types and task characteristics.
Read the original paper →