Auditing the hidden safety risks in AI agent execution systems

Chengzhi Liu, Yichen Guo, Yepeng Liu, Yuzhe Yang, Qianqi Yan, Xuandong Zhao, Wenyue Hua, Sheng Liu, Sharon Li, Yuheng Bu, Xin Eric Wang

Existing safety evaluations of LLM agents focus on final answers, overlooking violations that occur during execution—unauthorized resource access, context leaks between agents, and permission boundary breaches. HarnessAudit audits complete execution trajectories across three dimensions: boundary compliance, execution fidelity, and system stability. The authors release HarnessAudit-Bench, a benchmark of 210 tasks spanning eight real-world domains in single-agent and multi-agent configurations. Evaluating ten harness configurations with frontier models and three multi-agent frameworks, they show task completion diverges from safe execution, violations concentrate in resource access and inter-agent information transfer, and multi-agent setups significantly expand the attack surface.