← Back to Computation and Language
cs.CL

How vulnerable are AI agents to hidden attacks through third-party services?

Hiskias Dingeto, Will Leeney

June 1, 2026

LLM agents that read from Gmail, Salesforce, or Jira are exposed to indirect prompt injection attacks—adversaries slip malicious instructions into tool responses the agent trusts. AgentRedBench tests this risk across 24 real integrations with 215 subtle attack scenarios. Six major models show concerning vulnerability (up to 81% success rates), but the authors release AgentRedGuard, a defense trained on adversarial tool-response data that reduces attacks from 70% to 2.4% while keeping false positives under 0.4%.
Published as AgentRedBench: Dynamic Redteaming and Integration-Aware Defense for LLM Agents over SaaS Integrations arXiv:2606.02240
Read the original paper →