Prompt Injection Threatens AI Agents: New Study Unveils Systemic Vulnerabilities in GPT-5 and Gemini

June 12, 2026
Prompt Injection Threatens AI Agents: New Study Unveils Systemic Vulnerabilities in GPT-5 and Gemini
  • Prompt injection poses systemic risks to deployable web AI agents, with no single defense reliably blocking attacks across leading systems powered by GPT-5 and Gemini in realistic web environments.

  • AI agents powered by GPT-5 and Gemini are vulnerable to prompt injection attacks, with direct attacks succeeding in the majority of tested configurations.

  • The research highlights stealthy parasitism, where attackers influence outcomes like product recommendations while users complete their tasks, signaling asymmetric harms for different stakeholders.

  • StakeBench, a benchmark developed by NTU, ST Engineering, IBM Research, and UIUC, ran 3,168 adversarial trials across NanoBrowser and BrowserUse with 264 cases, showing high attack success through both indirect (about 41.7% to 68.2%) and direct injections (over 79%).

  • Preliminary multimodal findings suggest image content can sway agent decisions even when text and ratings remain unchanged, indicating new visual prompt injection vectors.

  • StakeBench highlights stakeholder-specific risks: seller-targeted attacks show higher success, user-targeted attacks are subtler and harder to detect, and platform-targeted attacks cause instability, meaning global attack rates don’t tell the full vulnerability story.

  • Researchers introduced StakeBench to evaluate AI agents’ resilience to prompt injections in realistic online environments, focusing on semantic distance, environmental cues, and execution-stage exposure.

  • Model and architecture choices matter: swapping GPT-5 for Gemini-2.5-Flash raised indirect prompt injection success by substantial margins, with BrowserUse generally more vulnerable than NanoBrowser, indicating resilience depends on backbone model and deployment architecture.

  • Findings show prompt-injection risk is not a single-model issue but a distribution of harms shaped by stakeholder, task alignment, and deployment architecture, signaling broader security concerns as AI agents go mainstream.

  • Indirect prompt injections embedded in web content achieved success rates from about 41.7% to 68.2%, underscoring real-world attack potential.

  • Attacks yield multiple failure modes—Robust Behavior, Stealthy Parasitism, Misaligned Disruption, and Compounded Failure—with Robust Behavior remaining unpopulated across configurations.

  • OpenAI and Google did not immediately respond to requests for comment.

Summary based on 2 sources


Get a daily email with more AI stories

More Stories