DIGIT

Pitfalls of Artificial Intelligence

Artificial Intelligence (AI) has rapidly emerged as a transformative technology, disrupting traditional industries and revolutionizing the way we live and work. However, as with any powerful technology, there are potential pitfalls and risks that must be considered. As AI continues to advance and become more ubiquitous, it is crucial to understand the potential risks and take steps to mitigate them.

In this series, we will explore 5 key pitfalls of AI. Every pitfall will be illustrated by a story and effective mitigation strategies will be discussed.

PITFALL 1

Introduction

Pitfall 4: Using unrepresentative data training data leads to biased models

Illustrative story

Facial recognition software is known to work better for certain ethnicities and genders than others. The principal reason is not that specific ethnicities and genders are easier to identify but that certain ethnicities and genders were more heavily represented in the training data used to develop these algorithms.

For example, when Apple released its facial recognition system, Face ID, in November 2017, it was criticized for misidentifying Asian individuals. Similarly, a 2019 study found that SenseTime, a Chinese AI startup valued at over 10 billion dollars, misidentified Somali men 10% of the time. These examples highlight the need for diverse and representative training data when developing facial recognition software.

Why?

Machine learning is the sub-field of AI that encompasses all algorithms that learn from data. Once a model is trained, it can be used to predict an output based on previously unseen data. This process is known as generalization.

To ensure the model generalizes well or in other words performs similarly on the training data and unobserved data, the training data needs to be sufficiently representative.

Firstly, we need to ensure that the inputs and outputs of the training data accurately represent the phenomenon being modelled. For instance, consider a resume-scanner algorithm that assesses resumes and decides whether to proceed with a candidate or not. The training data inputs are all previous resumes received by a company and the output is the HR decision to proceed with the candidate. Let’s now imagine the HR department was filled with racist, masochist, and islamophobia people who would always refuse a cv of non-whites, women, or people with an Arabic name. Our algorithm will model this behaviour and select candidates under the same criteria. Often models, despite their reputation for impartiality, reflect goals and ideology. When automating a process using past process data, we should always ask ourselves if the process was done well in the past. Secondly, we need to make sure that our training data is sufficiently complete. Does it include all types of unseen data?

This ensures that the model generalizes well on unseen data and does not become biased toward certain groups or attributes.

How to mitigate?

Using unrepresentative training data can lead to biased AI models, which can have serious consequences for companies. Here are some strategies that can be used to mitigate this risk

Use diverse training data: Use training data that is diverse and representative of the population being served.
Check data quality: Ensure your training data is of good quality. Data of bad quality can only be used to build bad models.
Upscale in-house data literacy: Ensure AI creators understand the different types of biases that can occur in training data. E.g., selection bias when certain data points are over- or under-represented or confirmation bias when the data is interpreted to confirm pre-existing beliefs.
Understand the importance of a training, validation, and test set. Optimizing an AI model’s performance on a validation and test set ensures it doesn’t become too complex by fitting too closely to the training data, leading to poor performance when applied to new data.
Conduct independent diagnostics: conduct independent diagnostics to evaluate a model’s accuracy and potential for bias. For example, by testing the model with dummy data not included in the training data.

Pitfall 4

WANT TO KNOW MORE?

AI can bring tremendous value to an organization if it is well-managed and understood. However, implementing AI can be complex and time-consuming, requiring specialized knowledge and resources.

At BrightWolves, we specialize in providing customized advice and solutions tailored to specific business needs. Our expertise in AI can help accelerate your digital & data transformation by providing valuable guidance on best practices and implementation strategies.

What sets us apart is our focus on the business side of data analytics, rather than just the technical aspects. We understand that data is only valuable if it helps businesses make better decisions and achieve their goals.

If you want to know more, do not hesitate to reach out to our AI experts: