← Back to Computation and Language
cs.CL

When models copy patterns instead of following orders

Carolina Camassa, Derek Shiller

May 19, 2026

LLMs face a fundamental conflict: they're trained to follow instructions, but they're also pattern-completion machines. Researchers tested this by giving 13 models an instruction to behave one way, then showing them 50 turns of examples demonstrating the opposite. Instruction-following collapsed to 1–99% success depending on the model, with no correlation to standard benchmarks. Output diversity mattered most—single-token responses crumbled fast, multi-token ones held firm. Models also misread their own behavior, confidently predicting resistance they didn't actually have.
Published as Do as I Say, Not as I Do: Instruction-Induction Conflict in LLMs arXiv:2605.20382
Read the original paper →