AI Risk · Generative AI

Indirect Prompt Injection

Indirect Prompt Injection Attacks involve modifying an LLMs behavior through external content accessed by the model.

📋 Description

Indirect Prompt Injection occurs when a model retrieves reference material (e.g., from a third-party site, document, or tool output) that has been maliciously altered to include harmful instructions. These embedded prompts override or interfere with the intended task, causing the LLM to misbehave, often in subtle and unexpected ways. Input/Output Checks and Prompt-based Mitigations will reduce the likelihood of the risk, but may not eliminate it. In sensitive contexts, the model should be treated as an untrusted user and given a minimum set of permissions (see the Excessive Agency risk for further discussion). If the output is presented to an individual, they should be aware of its potentially untrustworthy content.

This risk is relevant in systems that use Retrieval-Augmented Generation (RAG), call external APIs, or embed LLMs into workflows that pull from user-editable fields (e.g., emails, documents, calendars, or websites). It poses a unique challenge because the user accessing the system may not be the attacker—the malicious payload originates in external content.

🔍 Public Examples and Common Patterns

- LLMail-Inject Challenge: A proof-of-concept attack inserted a payload into a Google Calendar entry, which was then interpreted by an LLM-integrated assistant, bypassing normal user prompts. This scenario is explored in Microsoft's LLMail-Inject challenge, which evaluates prompt injection defenses in LLM-integrated email clients.

🛡️ Recommended Mitigations

📐 External Framework Mapping

- OWASP LLM Top 10: LLM01:2025 - Prompt Injection
- MITRE ATLAS: AML.T0025 Prompt Injection
- Databricks AI Security Framework: 9.1 - Prompt Injection

📚 References

- Benchmarking and Defending Against Indirect Prompt Injection Attacks
on Large Language Models
- Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
- What is a prompt injection attack?

Cite this page

Trustible. "Indirect Prompt Injection." Trustible AI Governance Insights Center, 2026. https://trustible.ai/ai-risks/indirect-prompt-injection/

← All AI Risks Insights Center

Manage AI Risk with Trustible

Trustible's AI governance platform helps enterprises identify, assess, and mitigate AI risks like this one at scale.

Explore the Platform

Contact Us