Prompt Injection Threatens AI Agents: New Study Unveils Systemic Vulnerabilities in GPT-5 and Gemini

June 12, 2026

Generative AI

AI Research

Prompt injection poses systemic risks to deployable web AI agents, with no single defense reliably blocking attacks across leading systems powered by GPT-5 and Gemini in realistic web environments.
AI agents powered by GPT-5 and Gemini are vulnerable to prompt injection attacks, with direct attacks succeeding in the majority of tested configurations.
The research highlights stealthy parasitism, where attackers influence outcomes like product recommendations while users complete their tasks, signaling asymmetric harms for different stakeholders.
StakeBench, a benchmark developed by NTU, ST Engineering, IBM Research, and UIUC, ran 3,168 adversarial trials across NanoBrowser and BrowserUse with 264 cases, showing high attack success through both indirect (about 41.7% to 68.2%) and direct injections (over 79%).
Preliminary multimodal findings suggest image content can sway agent decisions even when text and ratings remain unchanged, indicating new visual prompt injection vectors.
StakeBench highlights stakeholder-specific risks: seller-targeted attacks show higher success, user-targeted attacks are subtler and harder to detect, and platform-targeted attacks cause instability, meaning global attack rates don’t tell the full vulnerability story.
Researchers introduced StakeBench to evaluate AI agents’ resilience to prompt injections in realistic online environments, focusing on semantic distance, environmental cues, and execution-stage exposure.
Model and architecture choices matter: swapping GPT-5 for Gemini-2.5-Flash raised indirect prompt injection success by substantial margins, with BrowserUse generally more vulnerable than NanoBrowser, indicating resilience depends on backbone model and deployment architecture.
Findings show prompt-injection risk is not a single-model issue but a distribution of harms shaped by stakeholder, task alignment, and deployment architecture, signaling broader security concerns as AI agents go mainstream.
Indirect prompt injections embedded in web content achieved success rates from about 41.7% to 68.2%, underscoring real-world attack potential.
Attacks yield multiple failure modes—Robust Behavior, Stealthy Parasitism, Misaligned Disruption, and Compounded Failure—with Robust Behavior remaining unpopulated across configurations.
OpenAI and Google did not immediately respond to requests for comment.

Summary based on 2 sources

Get a daily email with more AI stories

Sources

Decrypt • Jun 12, 2026

AI Agents Still Can't Stop Prompt Injection Attacks, Researchers Warn

CSO Online • Jun 12, 2026

Prompt injection breaks today’s AI agents, study warns

Prompt Injection Threatens AI Agents: New Study Unveils Systemic Vulnerabilities in GPT-5 and Gemini

Get a daily email with more AI stories

Sources

More Stories