📋 Description
Low data quality undermines the performance, reliability, and fairness of AI systems. Models trained on incomplete, inaccurate, inconsistent, or outdated data are prone to producing biased or incorrect outputs. This can degrade user trust, reduce system effectiveness, and create compliance risks, especially in regulated environments.
The core dimensions of data quality include completeness, accuracy, consistency, timeliness, relevance, validity, uniqueness, and integrity. When these dimensions are not met, such as through missing customer attributes, misformatted values, or duplicated records, AI systems may fail to learn meaningful patterns or propagate faulty insights. For instance, an incomplete financial product record can mislead consumers, while outdated medical data can lead to harmful treatment recommendations.
Ensuring high data quality requires robust governance practices, quality monitoring tools, version control of datasets, and strict access permissions. Without these controls, data corruption or drift can go undetected, eroding model performance over time.
🔍 Public Examples and Common Patterns
Data Quality issues can emerge when a system deals with multiple disparate data sources that can be incomplete or track records differently. In these cases, manual validation may be required to build records that check all the properties described in the measurement guidance section.