Performance gaps between populations occur when AI models perform better for some user groups than others, even when the system is not explicitly evaluating individuals. For example, facial recognition models have been shown to perform worse on darker-skinned individuals (
Source). Similarly, AI-powered speech recognition models may hallucinate more when processing speech from individuals with impediments (
Source). These disparities are often the result of underrepresented or misrepresented data during training. They can reinforce societal inequities and lead to adverse outcomes for vulnerable populations. Evaluating and mitigating such gaps is essential for ensuring fair and equitable AI deployment.
This risk is particularly important in systems used for public services, healthcare, safety screening, or communication tools. In these contexts, failure to perform equally well for all users undermines trust and could result in serious consequences, especially for groups already facing structural disadvantages.