Can AI remember people in group chats?

Memory systems designed for one-on-one conversations break down when deployed in group chat settings, where facts must be tied to shared history, group norms must be distinguished from individual exceptions, and membership changes must be tracked accurately. SocialMemBench provides the first systematic evaluation: 1,031 QA pairs across five types of social groups (close friends, family, communities) and multiple sizes, identifying five distinct failure modes from single-stream conflation to entity merging at scale. Evaluation of four open-source memory frameworks (Mem0, LangMem, Graphiti, Cognee) shows they cluster around 0.12–0.18 accuracy, far below both retrieval baselines and human reasoning performance. Even Gemini 2.5 Flash with full conversation context scores only 0.72 on small networks.