When models copy patterns instead of following orders

LLMs face a fundamental conflict: they're trained to follow instructions, but they're also pattern-completion machines. Researchers tested this by giving 13 models an instruction to behave one way, then showing them 50 turns of examples demonstrating the opposite. Instruction-following collapsed to 1–99% success depending on the model, with no correlation to standard benchmarks. Output diversity mattered most—single-token responses crumbled fast, multi-token ones held firm. Models also misread their own behavior, confidently predicting resistance they didn't actually have.