Can AI moderate communities with their own rulebooks?

Zoher Kachwala, Bao Tran Truong, Rasika Muralidharan, Haewoon Kwak, Jisun An, Filippo Menczer

Social media platforms are moving toward community-governed moderation, where each group sets its own rules and norms. This paper introduces PluRule, a multilingual benchmark spanning 1,989 Reddit communities, 2,885 unique rules, and 9 languages, framed as a multiple-choice problem: given a comment and context, identify which rule is violated. Testing state-of-the-art vision-language models reveals a fundamental gap—GPT-5.2 marginally outperforms trivial baselines, and larger models or additional context provide only slight improvements. Universal rules like civility and self-promotion are easier to detect than community-specific norms. The benchmark and code are publicly available.