← Back to Artificial Intelligence cs.AI
Why AI agents fail at managing your personal apps
Wenhao Wang, Peizhi Niu, Gongyi Zou, Xiyuan Yang, Jingxing Wang, Haoting Shi, Yaxin Du, Jingyi Chai, Xianghe Pang, Shuo Tang, Yanfeng Wang, Siheng Chen
June 1, 2026
LLM agents excel at generic information lookup but flounder when managing personal accounts and databases in apps like Slack, Reddit, and Lark. MCP-Persona is the first benchmark testing agents on real-world personalized tools—not hypothetical ones—covering social media and enterprise platforms. Experiments show current top agents struggle significantly, revealing why they fail at everyday personal app tasks.
Read the original paper →