How vulnerable are AI agents to hidden attacks through third-party services?

LLM agents that read from Gmail, Salesforce, or Jira are exposed to indirect prompt injection attacks—adversaries slip malicious instructions into tool responses the agent trusts. AgentRedBench tests this risk across 24 real integrations with 215 subtle attack scenarios. Six major models show concerning vulnerability (up to 81% success rates), but the authors release AgentRedGuard, a defense trained on adversarial tool-response data that reduces attacks from 70% to 2.4% while keeping false positives under 0.4%.