AI Risk · Security

Data Poisoning

Data poisoning attacks involve the intentional injection of misleading or corrupted data into the training dataset of AI models, aiming to degrade the performance or manipulate the outcomes of these models.

📋 Description

Data poisoning is a form of adversarial attack in which malicious or misleading data is intentionally inserted into an AI system's training dataset. These attacks aim to subvert the model’s learning process, causing degraded performance, systemic bias, or specific erroneous outputs, while often remaining undetected by standard validation tools. Unlike conventional cybersecurity threats that target software vulnerabilities, data poisoning exploits the openness and scale of machine learning data pipelines, making it particularly difficult to detect and mitigate.

There are various types of data poisoning attacks, including:

1. Indiscriminate Poisoning: This type of attack aims to degrade the overall performance of the model. It is often effective against supervised learning, but can also impact unsupervised learning algorithms. Techniques like Contrastive Poisoning exploit specific vulnerabilities in contrastive learning models, significantly reducing their accuracy. For Generative AI, data can be poisoned simply by adding bad content (e.g. misinformation or inappropriate content) to the training data.

2. Targeted Poisoning: These attacks focus on causing the model to make specific, incorrect predictions. For example, an attacker might aim to alter a model used in healthcare to misdiagnose certain conditions, affecting only those predictions while leaving others unaffected (Source: University of Maryland Global Campus).

Data poisoning poses a serious threat in any setting where training data is sourced from external or user-generated inputs. As AI systems increasingly rely on online, automated, or continuously updated datasets, the attack surface for data poisoning continues to grow, requiring robust controls on data integrity, access, monitoring, and response.

🔍 Public Examples and Common Patterns

- Tay: a Twitter chatbot: Tay was a Twitter chatbot that learned from interactions. Through a coordinated attack, a malicious group of users tweeted toxic and inappropriate statements at the bot and taught it to speak in the same manner, leading to Tay's shutdown
- Russia’s ‘Pravda’ Disinformation Network: Russian hackers populated a large number of news portals with misinformation; this data was scraped by major AI developers and inadvertently used to train major LLMs.

🛡️ Recommended Mitigations

📐 External Framework Mapping

- OWASP LLM Top 10: LLM04:2025 - Data and Model Poisoning, LLM09:2025 - Misinformation
- MITRE ATLAS: AML.T0020 - Poison Training Data
- Databricks AI Security Framework: 3.1 – Data Poisoning
- IBM Risk Atlas: Data Poisoning

📚 References

- Protecting Smart Machines from Smart Attacks
- Indiscriminate Poisoning Attacks on Unsupervised Contrastive Learning
- Poisoning” Emerges as Threat to Artificial Intelligence
- Indiscriminate Poisoning Attack on Unsupervised Contrastive Learning

Cite this page

Trustible. "Data Poisoning." Trustible AI Governance Insights Center, 2026. https://trustible.ai/ai-risks/data-poisoning/

← All AI Risks Insights Center

Manage AI Risk with Trustible

Trustible's AI governance platform helps enterprises identify, assess, and mitigate AI risks like this one at scale.

Explore the Platform

Contact Us