← Back to Computation and Language cs.CL
When models copy patterns instead of following orders
Carolina Camassa, Derek Shiller
May 19, 2026
LLMs face a fundamental conflict: they're trained to follow instructions, but they're also pattern-completion machines. Researchers tested this by giving 13 models an instruction to behave one way, then showing them 50 turns of examples demonstrating the opposite. Instruction-following collapsed to 1–99% success depending on the model, with no correlation to standard benchmarks. Output diversity mattered most—single-token responses crumbled fast, multi-token ones held firm. Models also misread their own behavior, confidently predicting resistance they didn't actually have.
Read the original paper →