Security researchers have discovered a stealthy new method to manipulate Googleโs Gemini AI assistant by hiding malicious commands in email code that Gemini unknowingly follows.
These indirect prompt injection (IPI) methods allow scammers to plant fake alerts inside AI-generated summaries, making them appear as legitimate warnings from Google itself, eventually leading users straight into phishing traps.
How The Exploit Works
Unlike traditional phishing scams that rely on shady links or attachments, this technique is far more subtle, as the trick lies in the emailโs code. Attackers hide instructions in emails using invisible text โwhite font on a white background, zero-sized fonts, or off-screen elements. Though they remain unseen by the human eye, Gemini does see them and fully processes them.
Once the recipient clicks on โSummarize this emailโ in Google Workspace, Gemini scans the entire message, including the hidden sections. If those hidden parts contain malicious prompts, they are also included in the summary output.
This results in a fake yet convincing security alert urging users to call a support phone number or take urgent action. Since the alert appears to come from Gemini itself, users may trust it, making the attack especially dangerous.
Exploit Discovered Through A Bug Bounty Program
The prompt-injection vulnerability in Google Gemini for Workspace was disclosed to Mozillaโs 0din bug bounty program for generative AI tools by researcher Marco Figueroa, GenAI Bug Bounty Programs Manager at Mozilla. His demonstration showed how an attacker could embed hidden instructions using styling directives like <Admin> tags or CSS code, which are designed to hide content from human eyes, to trick Gemini.
As Gemini treats such instructions as part of the prompt, it ends up repeating them as if they were part of the original message in its summary output, without realizing they were malicious.
Figueroa provided a proof-of-concept exampleย to demonstrate how Gemini could be tricked into displaying a fake security alert, warning the user that their Gmail password had been compromised and providing a fraudulent support number to call.
Why This Matters
The attack is a form of indirect prompt injection, where malicious input is buried within content the AI is supposed to summarize. This has become a growing concern as generative AI becomes integrated into daily workflows. With Gemini integrated across Google WorkspaceโGmail, Docs, Slides, and Driveโany system where the assistant analyzes user content is potentially vulnerable.
What makes it more dangerous is that these summaries can seem very convincing. If Gemini includes a fake security warning, users might take it seriously as they trust Gemini as part of Google Workspace without realizing itโs actually a hidden malicious message.
Googleโs Multi-Layered Defense Strategy
In response, Google has rolled out a layered defense system for Gemini that is designed to make these attacks harder to pull off. Measures include:
- Machine learning classifiers to detect malicious prompts
- Markdown sanitization to strip out dangerous formatting
- Suspicious URL redaction
- A user confirmation framework that adds an extra checkpoint before executing sensitive tasks.
- Notifications to alert users when a prompt injection is detected
Google says it is also working with external researchers and red teams to refine its defenses and implement additional protections in future Gemini versions.
“We are constantly hardening our already robust defenses through red-teaming exercises that train our models to defend against these types of adversarial attacks,” a Google spokesperson told BleepingComputer in a statement.
While Google has stated thereโs no evidence yet of this technique being used in real-world attacks, the discovery is a clear warning that even AI-generated content, no matter how seamless, can still be manipulated.