AI Risk · Performance

Low Data Quality

Training data quality has a direct impact on model quality. Quality checks should be applied to the original data sources and to any preprocessing assumptions.

📋 Description

Low data quality undermines the performance, reliability, and fairness of AI systems. Models trained on incomplete, inaccurate, inconsistent, or outdated data are prone to producing biased or incorrect outputs. This can degrade user trust, reduce system effectiveness, and create compliance risks, especially in regulated environments.

The core dimensions of data quality include completeness, accuracy, consistency, timeliness, relevance, validity, uniqueness, and integrity. When these dimensions are not met, such as through missing customer attributes, misformatted values, or duplicated records, AI systems may fail to learn meaningful patterns or propagate faulty insights. For instance, an incomplete financial product record can mislead consumers, while outdated medical data can lead to harmful treatment recommendations.
Ensuring high data quality requires robust governance practices, quality monitoring tools, version control of datasets, and strict access permissions. Without these controls, data corruption or drift can go undetected, eroding model performance over time.

🔍 Public Examples and Common Patterns

Data Quality issues can emerge when a system deals with multiple disparate data sources that can be incomplete or track records differently. In these cases, manual validation may be required to build records that check all the properties described in the measurement guidance section.

📐 External Framework Mapping

- Databricks AI Security Framework: 1.3 - Poor Data Quality
Cite this page
Trustible. "Low Data Quality." Trustible AI Governance Insights Center, 2026. https://trustible.ai/ai-risks/data-quality/

Manage AI Risk with Trustible

Trustible's AI governance platform helps enterprises identify, assess, and mitigate AI risks like this one at scale.

Explore the Platform