AI Risk · Security

Model Evasion Attack

Model Evasion attacks manipulate inputs to get desired outputs from a model.

📋 Description

In Model Evasion attacks, malicious agents craft adversarial data to circumvent a system. For example, spam emails or malware can be designed to look inconspicuous and not trigger a negative label. The adversarial examples are derived programmatically by finding the decision boundary between classes (e.g., adding a large number of "not spam-like" words) or by adding in "noise" that will confuse the model.

These attacks often happen when an actor can test the system many times and figure out what inputs work. Stopping them means limiting access, checking for strange behavior, and testing how easily the model can be fooled.

🔍 Public Examples and Common Patterns

- Proof of Concept from Image Classification: This study shows that imperceptible changes to images can cause image models to apply different labels.

🛡️ Recommended Mitigations

📐 External Framework Mapping

- Databricks AI Security Framework: 9.3 – Model breakout
- MITRE Atlas: Craft Adversarial Data: Black-Box Optimization

📚 References

- MITRE Atlas
- Adversarial ML Threat Matrix

Cite this page

Trustible. "Model Evasion Attack." Trustible AI Governance Insights Center, 2026. https://trustible.ai/ai-risks/model-evasion/

← All AI Risks Insights Center

Manage AI Risk with Trustible

Trustible's AI governance platform helps enterprises identify, assess, and mitigate AI risks like this one at scale.

Explore the Platform

Contact Us