Data poisoning is a form of adversarial attack in which malicious or misleading data is intentionally inserted into an AI system's training dataset. These attacks aim to subvert the model’s learning process, causing degraded performance, systemic bias, or specific erroneous outputs, while often remaining undetected by standard validation tools. Unlike conventional cybersecurity threats that target software vulnerabilities, data poisoning exploits the openness and scale of machine learning data pipelines, making it particularly difficult to detect and mitigate.
There are various types of data poisoning attacks, including:
1. Indiscriminate Poisoning: This type of attack aims to degrade the overall performance of the model. It is often effective against supervised learning, but can also impact unsupervised learning algorithms. Techniques like
Contrastive Poisoning exploit specific vulnerabilities in contrastive learning models, significantly reducing their accuracy. For Generative AI, data can be poisoned simply by adding bad content (e.g. misinformation or inappropriate content) to the training data.
2. Targeted Poisoning: These attacks focus on causing the model to make specific, incorrect predictions. For example, an attacker might aim to alter a model used in healthcare to misdiagnose certain conditions, affecting only those predictions while leaving others unaffected (Source:
University of Maryland Global Campus).
Data poisoning poses a serious threat in any setting where training data is sourced from external or user-generated inputs. As AI systems increasingly rely on online, automated, or continuously updated datasets, the attack surface for data poisoning continues to grow, requiring robust controls on data integrity, access, monitoring, and response.