AI Risk · Security

System Information Extraction Attack

Malicious actors can extract information about datasets, models and system prompts from an AI System, and use it to subvert the system or steal sensitive data.

📋 Description

System Information Extraction Attacks occur when adversarial actors maliciously interact with the system to extract private or proprietary information about the model or its training data. These attacks pose severe privacy and security risks, especially for models trained on sensitive datasets. The goal of these attacks can include:

- Gaining access to sensitive data by reverse-engineering the training data.
- Creating a copy of a proprietary model for offline use.
- Evading the system by studying the model's behavior.

Specific attacks included in this risk are:

- *Data Inference Attacks* involve probing the system with targeted inputs to learn whether specific data was included in the training set. This can expose private attributes (e.g., a patient’s disease status) or entire records.

- *Model Inversion Attacks* reconstruct likely inputs or features based on output scores, logits, or confidence values. This is common in models with overfitting or limited generalization during training, such as image classifiers or biometric systems.

- *System Prompt Leak Attacks* query an LLM to expose the system prompt. This information can be used to aid in further jailbreaking, by letting the attack understand the specific rules built into the system.

🔍 Public Examples and Common Patterns

- Identifying inference attacks against healthcare data repositories: An attacker used HCUPnet's query system to infer private patient data despite suppression rules. They exploited overlapping queries to reveal hidden information.

🛡️ Recommended Mitigations

📐 External Framework Mapping

- MITRE ATLAS: AML.T0024.001, AML.T0024.002
- IBM Risk Atlas: Attribute inference attack risk for AI
- Databricks AI Security Framework: 9.2 - Model Inversion
- OWASP LLM Top 10: LLM07:2025 System Prompt Leakage

📚 References

- OWASP ML03:2023
- OWASP ML04:2023
- Membership Inference Attacks against Machine Learning Models
- Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures

Cite this page

Trustible. "System Information Extraction Attack." Trustible AI Governance Insights Center, 2026. https://trustible.ai/ai-risks/data-inference-model-inversion-attack/

← All AI Risks Insights Center

Manage AI Risk with Trustible

Trustible's AI governance platform helps enterprises identify, assess, and mitigate AI risks like this one at scale.

Explore the Platform

Contact Us