How well do AI models and humans agree on right and wrong?

Yevhen Kostiuk, Kenneth Enevoldsen, Peter Bjerregaard Vahlstrup, Márton Kardos, Kristoffer Nielbo

Existing tests for social norm alignment rely on artificial multiple-choice questions. This work proposes matching free-form responses to social dilemmas—comparing what an LLM says to human reference answers and vice versa. The team built a dataset of 3,000 non-trivial Danish social scenarios with human-judged reference solutions, then measured agreement between LLMs, humans, and combinations. Models showed consistent rankings but uneven alignment: they agreed more on concrete issues like neighbor disputes than abstract moral questions. Code and dataset available.