cs.AI

Why AI agents fail at managing your personal apps

Wenhao Wang, Peizhi Niu, Gongyi Zou, Xiyuan Yang, Jingxing Wang, Haoting Shi, Yaxin Du, Jingyi Chai, Xianghe Pang, Shuo Tang, Yanfeng Wang, Siheng Chen

June 1, 2026

LLM agents excel at generic information lookup but flounder when managing personal accounts and databases in apps like Slack, Reddit, and Lark. MCP-Persona is the first benchmark testing agents on real-world personalized tools—not hypothetical ones—covering social media and enterprise platforms. Experiments show current top agents struggle significantly, revealing why they fail at everyday personal app tasks.

Published as MCP-Persona: Benchmarking LLM Agents on Real-World Personal Applications via Environment Simulation arXiv:2606.02470

Read the original paper →