← Back to Computation and Language
cs.CL

Can AI moderate communities with their own rulebooks?

Zoher Kachwala, Bao Tran Truong, Rasika Muralidharan, Haewoon Kwak, Jisun An, Filippo Menczer

May 16, 2026

Social media platforms are moving toward community-governed moderation, where each group sets its own rules and norms. This paper introduces PluRule, a multilingual benchmark spanning 1,989 Reddit communities, 2,885 unique rules, and 9 languages, framed as a multiple-choice problem: given a comment and context, identify which rule is violated. Testing state-of-the-art vision-language models reveals a fundamental gap—GPT-5.2 marginally outperforms trivial baselines, and larger models or additional context provide only slight improvements. Universal rules like civility and self-promotion are easier to detect than community-specific norms. The benchmark and code are publicly available.
Published as PluRule: A Benchmark for Moderating Pluralistic Communities on Social Media arXiv:2605.17187
Read the original paper →