Bias and Fairness in AI

Basics

of Bias & Fairness
in AI systems

Thank you for your interest!

On this page we would like to introduce you to the concepts
of bias and fairness in artificial intelligence (AI) systems.

Thank you for your interest!

On this page we would like to introduce you to the concepts of bias and fairness in artificial intelligence (AI) systems.

Page structure

Oerview

The main page is divided into seven sections that build on each other. You can access each section by clicking on the icon in the box below.

Page structure

Overview

The main page is divided into seven sections that build on each other. You can access each section by clicking on the icon in the box below.

1. Definition
of key terms
2. AI systems in
everyday life
3. Different
types of bias

4. Algorithmic 
Fairness

1. Definition of key terms

2. AI systems in everyday life

3. Different types of bias

4. Algorithmic Fairness

5. Bias & Fairness
in the real world

6. Bias
mitigation
7. Future
relevance

Read more

5. Bias & Fairness in the real world

6. Bias mitigation

7. Future relevance

Read more

Editorial note:
Our exercises build on the basics taught on this page. We therefore recommend that you work on the basics and the exercises in parallel. But don’t worry! At the end of each section, we will link to the relevant exercises and direct you back to them.

Section 1 / 7

Definitions

1. Key terms

First, learn the definitions of the key terms ‘AI systems‘, ‘bias‘ and ‘fairness‘. To do this, click on one of the three boxes.

Section 1 / 7

Definitions

1. Key terms

First, learn the definitions of the key terms ‘AI systems‘, ‘bias‘ and ‘fairness‘. To do this, click on one of the three boxes.

AI Systems

What are Artificial Intelligence Systems?

What are Artificial Intelligence Systems?

AI systems are human-developed software or hardware systems that perceive their environment by collecting data and processing the collected data. AI systems use this information to make decisions. However, due to its complexity, the behaviour of an AI system is often difficult for humans to understand. This is why it is called a 'black box'.

Bias

How do you define bias in an AI system?

How can bias be defined in a decision-making system?

Bias generally refers to a distorting effect. In psychology, bias refers to attitudes or stereotypes that positively or negatively influence our perceptions of our environment, decisions and actions. In statistics, bias is an error in the collection and processing of data, or the conscious or unconscious influence of subjects.
Bias can take many forms, some of which can lead to unfairness.

Fairness

What are unfair algorithmic decisions?

What does fairness mean?

In the context of decision making, unfairness is the presence of prejudice or preference towards a person or group based on their innate or acquired characteristics. An unfair algorithm makes decisions that are biased against a particular group of people. Similar to bias, discrimination is also a source of unfairness. Discrimination is due to human prejudice and stereotyping based on sensitive characteristics, which may be intentional or unintentional.
Section 2 / 7

Artificial intelligence systems

2. AI systems in everyday life

Learn about the importance of artificial intelligence (AI) for business, society and your everyday life. For a deeper understanding, we also distinguish between the terms “artificial intelligence” and “machine learning” in this section.

Section 2 / 7

Systems based on artificial intelligence

2. AI systems in everyday life

Learn about the importance of artificial intelligence (AI) for business, society and your everyday life. For a deeper understanding, we also distinguish between the terms “artificial intelligence” and “machine learning” in this section.

Applications of AI systems

Many businesses and governments are already using data-driven, algorithmic decision-making systems. In the foreseeable future, there will be few industries or areas of daily life where artificial intelligence (AI) systems will not be ubiquitous.

You already use AI in your everyday life. Have you tried Spotify or binge-watched a Netflix series today? An AI system helps you find the music and movies you like. An algorithm learns from your choices and then recommends new songs you are most likely to add to your Spotify playlist. Smart home devices like Alexa, or automatic facial recognition when unlocking your smartphone, are two more examples of the importance of AI in our everyday lives.

AI is now behind many things from chatbots and shopping recommendations to navigation with Google Maps. Google, for example, uses AI to understand search queries and evaluate relevant results. Companies such as Facebook and LinkedIn are using AI systems to identify questionable content. This includes potentially violent, pornographic or politically extreme content. Images, text and videos that (could) fall into these categories are automatically flagged by the AI. AI systems are also being used in many areas of medicine: AI systems learn to make diagnoses based on image data.

Machine learning what is it?

Machine learning (ML) as a subfield of AI enables machines to find patterns in data sets without explicit programming of rules and to make decisions and predictions based on this analysis. This is made possible by the increasing availability of big data and high computing power. ML applications typically become more accurate the more data they have available – without the need for additional programming.

The dark side of AI

We are seeing more and more headlines about AI systems being used inappropriately and making discriminatory decisions. For example, when an AI system is used to filter job applications and only male candidates are selected. As a result, considering bias and fairness in the development of such systems has already become very important.

However, there is still a problem: Many people, not only in the general public but also among those who develop and use AI systems, still make the blanket assumption that algorithmic decisions are objective and neutral. But this is not a given, nor is a decision based solely on objective characteristics necessarily fair and non-discriminatory. Given this situation, there is a great need to gain a deep understanding of the challenges in the use and implementation of AI systems and possible solutions.

Section 3 / 7

The concept of bias

3. Different types of bias

In this section you will learn about different types of bias within the ML lifecycle.

Section 3 / 7

The concept of bias

3. Different types of bias

In this section you will learn about different types of bias within the ML lifecycle.

Machine learning (ML) is increasingly being used to make decisions that affect people’s lives. Typically, algorithms learn from existing data and apply the learned patterns to unseen data. As a result, problems can arise in data collection, model development and system deployment that can lead to various biases.

Bias can arise at any stage of the ML lifecycle. The ML lifecycle comprises a series of decisions and practices in the development and use of ML systems. Each stage involves decisions that can introduce bias. The process starts with data collection. This involves defining a target population and drawing a sample from it, as well as identifying and measuring characteristics and labels. This data set is divided into training and test data. The training data is used to ‘learn’ an ML model. The test data is used to evaluate the model. The model is then used in a real application to make decisions for its users. This process is cyclical: for example, the model’s decisions influence the state of the world at the time of the next data collection or decision.

Click on the red dots to learn about the eleven bias types in the ML life cycle:

No problem. You can find all the information from the interactive graphic in this box.

  • Historical bias: Historical bias is the pre-existing bias and socio-technical issues in the world that can enter the data generation process even with perfect sampling and feature selection.
 
  • Representation bias: Representation bias results from the way we collect data. Non-representative samples lack the diversity of the population (e.g. missing subgroups).
 
  • Measurement bias: Measurement bias occurs when features and labels are selected, recorded or calculated to be used in a prediction problem. Typically, a feature or label is a proxy (a concrete measure) chosen to approximate a construct that is not directly coded or observable.
 
  • Omitted variable bias: Omitted variable bias occurs when one or more important variables are omitted from the model.
 
  • Evaluation bias: Evaluation bias occurs when the training data used for a particular task is not representative of the user population. Evaluation bias can also be exacerbated by the choice of performance metrics.
 
  • Algorithmic bias: These are distortions that are not present in the data, but are added by the algorithm.
 
  • Aggregation bias: Aggregation bias occurs when a unit model is applied to data that has underlying groups or types of examples that should be considered differently.
 
  • User interaction bias: This refers to a bias by the user interface and by the user himself, as the system imposes his self-chosen one-way behaviour and interaction.
 
  • Population bias: Population bias occurs when the statistics, demographics, representation and user characteristics of the platform’s user population differ from those of the original target group.
 
  • Deployment bias: Deployment bias generally refers to any bias that occurs during use when a system is used or interpreted in an inappropriate way that was not intended by the designers or developers.
 
  • Feedback loop: A feedback loop between data, algorithms and users can exacerbate existing sources of bias.
  •  

No problem. You can find all the information from the interactive graphic in this box.

 

  • Historical bias: Historical bias is the pre-existing bias and socio-technical issues in the world that can enter the data generation process even with perfect sampling and feature selection.
  • Representation bias: Representation bias results from the way we collect the data. Non-representative samples lack the diversity of the population (e.g. missing subgroups).
  • Measurement bias: Measurement error occurs when features and labels are selected, recorded or calculated to be used in a prediction problem. Typically, a feature or label is a proxy (a concrete measure) chosen to approximate a construct that is not directly coded or observable.
  • Omitted variable bias: Omitted variable bias occurs when one or more important variables are omitted from the model.
  • Evaluation bias: Bias occurs when the training data used for a particular task is not representative of the population of users. Bias can also be exacerbated by the choice of performance metrics.
  • Algorithmic bias: These are distortions that are not present in the data, but are added by the algorithm.
  • Aggregation bias: Aggregation bias occurs when a unit model is applied to data that has underlying groups or types of examples that should be considered differently.
  • User interaction bias: This refers to a bias by the user interface and by the user himself, as the system imposes his self-chosen one-way behaviour and interaction.
  • Population bias: Population bias occurs when the statistics, demographics, representation and user characteristics of the platform’s user population differ from those of the original target group.
  • Deployment bias: System use bias generally refers to any bias that occurs during use when a system is used or interpreted in an inappropriate way that was not intended by the designers or developers.
  • Feedback loop: A feedback loop between data, algorithms and users can exacerbate existing sources of bias.A feedback loop between data, algorithms and users can exacerbate existing sources of bias.
  •  
  •  

Historical bias is the pre-existing

bias and socio-technical issues

in the world that can enter the

data generation process even with perfect

sampling and feature selection.

Representation bias results from

the way we collect data. Non-representative

samples lack the diversity of the

population (e.g. missing subgroups).

Measurement bias occurs when

features and labels are selected,

recorded or calculated to be used in a

prediction problem.

Typically, a feature or label is

a proxy (a concrete measure) chosen

to approximate a construct that is not

directly coded or observable.

Omitted variable bias occurs

when one or more important

variables are omitted from the model.

Evaluation bias occurs when the training data

used for a particular task is not

representative of the user population.

Evaluation bias can also be exacerbated by the choice

of performance metrics.

These are distortions that are not

present in the data, but are added by the algorithm.

Aggregation bias occurs when a unit

model is applied to data that has underlying

groups or types of examples that should

be considered differently.

This refers to a bias by the user

interface and by the user himself,

as the system imposes his self-chosen

one-way behaviour and interaction.

Population bias occurs when the statistics,

demographics, representation and user

characteristics of the platform's user population differ

from those of the original target group.

Deployment bias generally refers

to any bias that occurs during use when

a system is used or interpreted in an inappropriate

way that was not intended by the designers or developers.

A feedback loop between data,

algorithms and users can exacerbate

existing sources of bias.

In this box we describe an illustrative example of each of the types of bias listed above.

 
  • Historical bias: An example of this type of bias can be seen in a Google image search result from 2018: the search query “women as CEOs” returned more images of male CEOs than the desired images of female CEOs, as only just under 5% of Fortune 500 CEOs were women at the time.
 
  • Representation bias: The lack of geographical diversity in datasets such as ImageNet (https://www.image-net.org/) leads to a demonstrable bias towards Western cultures.
 
  • Measurement bias: An example of this type of bias was observed in the COMPAS recidivism risk prediction tool, where previous arrests and arrests among friends/family were used as surrogate variables to measure ‘risk level’ or ‘criminality’ – which can be considered as mismeasured proxies.
 
  • Omitted variable bias: For example, many regressions where wage or income is the dependent variable suffer from omitted variable bias. Often there is no practical way to add an employee’s innate skills or motivation as explanatory variables.
 
  • Evaluation bias: This includes, for example, using inappropriate and disproportionate comparator data to evaluate applications, such as the Adience dataset, which is 79.6% Caucasian. This dataset is used to evaluate facial recognition systems where skin colour and gender are factors.
 
  • Algorithmic bias: The use of certain optimisation functions, regularisations, the application of regression models to the data as a whole or to subgroups, and the general use of statistically biased estimators are examples of this type of bias.
 
  • Aggregation bias: An example would be data showing that students in large cities tend to perform poorly in standardised tests. This does not mean that everyone is doing badly.
 
  • User interaction bias: For example, users on the Internet can only click on content that is presented to them, i.e. the content they see is clicked on, while everything else is not clicked on.
 
  • Population bias: Biases in the population lead to unrepresentative data. For example, there are more male than female spectators in football stadiums.
 
  • Deployment bias: Algorithmic risk assessment tools such as COMPAS are models designed to predict a person’s likelihood of committing a future crime. In practice, however, these tools could also be used ‘off-purpose’, for example to determine the length of a sentence.
 
  • Feedback loop: Recommendation algorithms are known to frequently recommend a few popular items while “ignoring” the majority of other items. These recommendations are then consumed by users, whose reactions are recorded and added to the system.
 

Note that the different types of bias are not mutually exclusive, i.e. an AI system can suffer from more than one type of bias. For example, AI systems in fitness trackers may suffer from representation bias if darker skin tones are not included in the training dataset, measurement bias if the fitness tracker performs worse for darker skin tones, and evaluation bias if the dataset used to evaluate the AI system does not include darker skin tones.

Learn more: https://www.youtube.com/watch?v=vVRWeGlMkGk

What should you bring?

Bias does not only come from biased data. Bias can also come from how the AI system is modelled, how the system is evaluated, or how users interpret the final results of the AI system.

Use the first course in this unit to better understand the different types of bias. Being aware of the many types of bias will help you to better identify them in AI systems.

 

In this box we describe an illustrative example of each of the types of bias listed above.
 
  • Historical bias: An example of this type of bias can be seen in a Google image search result from 2018: the search query “women as CEOs” returned more images of male CEOs than the desired images of female CEOs, as only just under 5% of Fortune 500 CEOs were women at the time.
 
  • Representation bias: The lack of geographical diversity in datasets such as ImageNet (https://www.image-net.org/) leads to a demonstrable bias towards Western cultures.
 
  • Measurement bias: An example of this type of bias was observed in the COMPAS recidivism risk prediction tool, where previous arrests and arrests among friends/family were used as surrogate variables to measure ‘risk level’ or ‘criminality’ – which can be considered as mismeasured proxies.
 
  • Omitted variable bias: For example, many regressions where wage or income is the dependent variable suffer from omitted variable bias. Often there is no practical way to add an employee’s innate skills or motivation as explanatory variables.
 
  • Evaluation bias: This includes, for example, using inappropriate and disproportionate comparator data to evaluate applications, such as the Adience dataset, which is 79.6% Caucasian. This dataset is used to evaluate facial recognition systems where skin colour and gender are factors.
 
  • Algorithmic bias: The use of certain optimisation functions, regularisations, the application of regression models to the data as a whole or to subgroups, and the general use of statistically biased estimators are examples of this type of bias.
 
  • Aggregation bias: An example would be data showing that students in large cities tend to perform poorly in standardised tests. This does not mean that everyone is doing badly.
 
  • User interaction bias: For example, users on the Internet can only click on content that is presented to them, i.e. the content they see is clicked on, while everything else is not clicked on.
 
  • Population bias: Biases in the population lead to unrepresentative data. For example, there are more male than female spectators in football stadiums.
 
  • Deployment bias: Algorithmic risk assessment tools such as COMPAS are models designed to predict a person’s likelihood of committing a future crime. In practice, however, these tools could also be used ‘off-purpose’, for example to determine the length of a sentence.
 
  • Feedback loop: Recommendation algorithms are known to frequently recommend a few popular items while “ignoring” the majority of other items. These recommendations are then consumed by users, whose reactions are recorded and added to the system.
 
 

Note that the different types of bias are not mutually exclusive, i.e. an AI system may suffer from more than one type of bias. For example, AI systems in fitness trackers may suffer from representation bias if darker skin tones are not included in the training dataset, measurement bias if the fitness tracker performs worse on darker skin tones, and evaluation bias if the dataset used to evaluate the AI system does not include darker skin tones.

Learn more: https://www.youtube.com/watch?v=vVRWeGlMkGk)

What should you know?

Bias does not only come from biased data. Bias can also come from how the AI system is modelled, how the system is evaluated, or how users interpret the final results of the AI system.

Use the first course in this unit to better understand the different types of bias. Being aware of the many types of bias will help you to better identify them in AI systems.

 

The concept of bias
The bias types in detail


Develop your skills with practical exercises.

Section 4 / 7

The concept of fairness

4. Algorithmic fairness

Learn about the statistical definitions of fairness.

Section 4 / 7

The concept of fairness

4. Algorithmic fairness

Learn about the statistical definitions of fairness.

The concept of fairness is to ensure that the AI system does not result in unfair decisions or discrimination. Respecting the concept of fairness is required of an AI system from both an ethical and a legal perspective. Indeed, it is forbidden to treat equal social circumstances unequally or unequally unless a different approach is objectively justified. In particular, this means that individuals should not be discriminated against because they belong to a marginalised or disadvantaged group.

The potential harm scenario that the concept of fairness primarily addresses is discrimination by an AI system against a particular group of people – be it on the basis of an individual’s ethnic origin, gender, age, religion/belief or other indicators. These indicators are considered sensitive characteristics for which non-discrimination should be established. The concept of fairness involves identifying the potential impact of discrimination from the perspective of those affected. This is particularly relevant for AI systems that make decisions about individuals. Examples include AI-based lending, selection of job applicants or recommendations for medical treatment. The consequences of discrimination by AI systems can be violations of personal rights, financial loss or damage to reputation.

While unfairness can be captured intuitively through various examples, the challenge is to define fairness objectively, metrics-based and as scalable as possible. Below we present concrete ways to quantify fairness.

Statistical definitions of fairness

There are different ways to define when an AI system is fair. We look at statistical definitions of fairness below. We focus here on classification in the ML field. Classification here refers to the identification of a category (e.g. creditworthy vs. not creditworthy) for a data instance (e.g. the data of a bank customer) based on training data whose categories are known.

Most of the metrics used to assess the fairness of a model relate either to the types of errors a model might make, or to the predictive power of the model for different groups. They can often be derived from the values of the so-called confusion matrix. It contains the number of correctly and incorrectly classified test data of the model per class.

We will consider the confusion matrix for the example of binary classification. For example, a model could classify x-rays into the classes “sick” or “healthy”, or a model could classify the data of a bank customer into the classes “creditworthy” or “uncreditworthy”. It is important that the results are verified beforehand. For example, images are used as test data, where there is no doubt about how many images really represent the clinical picture.

 

Statistical Performance Metrics

Classification models do not have to be binary – they can be trained on more than two classes. Most performance metrics can be derived from those of binary classification, so we restrict ourselves to binary classification in this learning unit.

Note: Turn on the English subtitles in the videos (cogwheel).

So, as explained in the video, when working with a binary classifier, both the predicted and the actual classes can take two values: Class 1 and class 2. We will first look at the different possible relationships between predicted and actual results:

  • True Positives (TP): Data distances have been classified as class 1 and are class 1.
  • False Positives (FP): Data instances were classified as class 1, but are class 2.
  • True Negatives (TN): Data instances have been classified as class 2 and are class 2.
  • False Negatives (FN): Data instances have been classified as class 2, but are class 1.
 
Most statistical definitions of fairness are based on various performance metrics that we have already introduced in the video. In the following, we will briefly review these as well:
 
  • Accuracy is the proportion of data instances correctly classified by the model. It is calculated by dividing the number of correctly classified data instances (TP + TN) by the number of all classified data instances (TP + FN + FP + TN; corresponds to the number of test data).
 
  • Precision for a class is the proportion of correctly classified data instances out of all data instances that the model has assigned to that class. One calculates TP / (TP + FP).
 
  • Recall for a class is the proportion of correctly classified data instances out of all data instances that are actually assigned to that class. One calculates TP / (TP + FN).

There are many more performance metrics, such as the following four:

 
  • False Positiv Rate (FPR) is the proportion of data instances that actually belong to class 2 that the model has incorrectly assigned to class 1.
  • False Negativ Rate (FNR) is the proportion of data instances that actually belong to class 1 that the model has incorrectly assigned to class 2.
  • False Discovery Rate (FDR) is the proportion of data instances classified as class 1 by the model that are actually classified as class 2.
  • False Omission Rate (FOR) is the proportion of data instances classified as class 2 by the model that are actually classified as class 1.
 
For FPR and FNR, note that the denominator is based on actual results (not model predictions). For FDR and FOR, the denominator is based on model predictions.
 

Often, however, a model does not directly output one of the two classes (class 1 or class 2) for a data instance, but rather a specific value. In order to assign the data instance to one of the two classes, a threshold value is set above or below which one or the other class is output. For this reason, the Receiver Operator Characteristic (ROC) curve is often used in the analysis of binary outcomes to show the performance of a model. The ROC curve provides information about performance over a range of thresholds and can be summarised by the area under the curve (AUC), a single number.

The ROC curve plots the performance metric TPR against the performance metric FPR at different classification thresholds. The following figure shows a typical ROC curve.

The AUC (area under the curve) measures the two-dimensional area under the entire ROC curve (think calculus). The area under the curve is a measure of a classifier’s ability to discriminate between classes and is used as a summary of the ROC curve. The higher the AUC, the better the performance of the model in distinguishing between class 1 and class 2.

There are many more performance metrics, such as the following four:

 
  • False Positiv Rate (FPR) is the proportion of data instances that actually belong to Class 2 that the model has incorrectly assigned to Class 1.
  • False Negativ Rate (FNR) is the proportion of data instances actually belonging to Class 1 that the model has incorrectly assigned to Class 2.
  • False Discovery Rate (FDR)  is the proportion of data instances classified as Class 1 by the model that are actually classified as Class 2.
  • False Omission Rate (FOR) is the proportion of data instances classified as Class 2 by the model that are actually classified as Class 1.
For FPR and FNR, note that the denominator is based on actual results (not model predictions). For FDR and FOR, the denominator is based on model predictions.

Often, however, a model does not directly output one of the two classes (Class 1 or Class 2) for a data instance, but rather a specific value. In order to assign the data instance to one of the two classes, a threshold value is set above or below which one or the other class is output. For this reason, the Receiver Operator Characteristic (ROC) curve is often used in the analysis of binary outcomes to show the performance of a model. The ROC curve provides information about performance over a range of thresholds and can be summarised by the area under the curve (AUC), a single number.

The ROC curve plots the performance metric TPR against the performance metric FPR at different classification thresholds. The following figure shows a typical ROC curve.

The AUC (area under the curve) measures the two-dimensional area under the entire ROC curve (think integral calculus). The area under the curve is the measure of a classifier’s ability to discriminate between classes and is used as a summary of the ROC curve. The higher the AUC, the better the performance of the model in distinguishing between Class 1 and Class 2.

Statistical definitions of fairness in relation to performance metrics

Below we present three statistical definitions of fairness based on the performance metrics mentioned above. We have also prepared a video for you:

Note: Turn on the English subtitles in the videos (cogwheel).

We repeat the three definitions of fairness presented:

  • Demographic parity: A classifier meets this definition if different groups of people (e.g. women and men, or African Americans and non-African Americans) have the same probability of being classified as class 1.
 
  • Equal opportunity: A classifier meets this definition if different groups of people have the same true positive rate.
 
  • Equalized odds: A classifier meets this definition if different groups of people have the same true positive rate and the same false positive rate.
 
 

In practice, it is not possible to optimise a model for all definitions of fairness. So which definition of fairness should you choose? As with most ethical questions, the answer is usually not easy to find and the choice of definition should be discussed in a conversation involving all members of your team.

 

 

By the wayIf you are working on real problems, the data will be much, much larger. In this case, the Confusion Matrix is still a useful tool for analysing performance. An important point, however, is that real-world models cannot usually be expected to meet every definition of fairness perfectly. For example, if “demographic parity” is chosen as the fairness definition and a model is expected to select 50% men, the final model may select a percentage close to 50% but not exactly 50% (such as 48% or 53%).

 
 

The concept of fairness
The definitions of fairness
in detail


Develop your skills with practical exercises.

Section 5 / 7

Identify bias & quantify unfairness

5. Bias in the real world

In this section you will learn how to deal with issues of bias & fairness in the real world.

Section 5 / 7

Detect bias & quantify unfairness

5. Bias in the real world

In this section you will learn how to deal with issues of bias & fairness in the real world.

It is not easy to detect bias in your AI system, as it can occur at any point in the ML lifecycle. Furthermore, different people perceive different outcomes as ‘fair’.

You have already learned that bias can occur in a variety of specific ways in an ML lifecycle. You have also seen that there is no single approach to fairness, but rather different interpretations. So how do you deal with bias and fairness in AI systems in the real world? There are many questions that need to be asked to address this issue. The following graphic shows possible questions for selected phases of the ML lifecycle that can be asked to avoid unfair decisions.

One big problem is that bias is rarely obvious. Think of the comments posted under a post on a social networking site. If a comment violates the platform’s policies, such as hate speech, it may be deleted by the platform after it is posted. Some platforms use AI systems to automatically select and delete such hateful comments. But who decides which comments are hateful? Could such an AI system produce unfair results? Could someone be discriminated against?

Use the third course in this unit to explore this scenario in more detail. You can work directly on a real dataset, check for bias in an AI system, and examine performance metrics.

Bias in the real world
Detect bias &
quantify unfairness


Develop your skills with practical exercises.

Section 6 / 7

Bias mitigation

6. Improving AI fairness

In this section you can learn about different mitigation strategies.

Section 6 / 7

Bias mitigation

6. Improving AI fairness

In this section you can read about different mitigation strategies.

Bias in AI systems can take many forms, leading to unfair or discriminatory decisions. But there are also many ways to mitigate bias. In this section, we give you an insight into possible mitigation strategies.

There are some approaches to mitigating or eliminating bias at different stages of the ML lifecycle. However, there is no ‘one size fits all’ approach. Approaches range from formulating an application that is relatively free of bias, to collecting relatively unbiased data, to developing algorithms that minimize bias. Below we present three concrete approaches.

Diversity in teams

All models are created by humans and reflect human biases. Machine learning models can reflect the biases of organisational teams, the designers on those teams, the data scientists implementing the models, and the data engineers collecting the data. Of course, they also reflect the biases inherent in the data itself. Just as we expect a certain level of trustworthiness from human decision-makers, we expect a certain level of trustworthiness from our models. So to reduce bias, it is important that teams are as diverse as possible.

Explainable AI

Explainable Artificial Intelligence (XAI) is a field that is essentially about making AI systems more transparent, so that people can trust and scrutinise an AI system – including in terms of bias and fairness. More specifically, XAI encompasses a variety of technologies and measures that ensure the transparency of an AI system. The goal is always to make the results or internal workings of AI systems understandable to human users. This can also significantly support the detection and correction of biases in the ML lifecycle. XAI can therefore be seen as a way to mitigate bias and improve the fairness of AI.

Algorithms to minimise bias

Techniques for minimizing bias in algorithms fall into three categories:

  • Pre-processing algorithms: Pre-processing techniques attempt to transform the data in a way that minimizes the underlying discrimination.
 
  • In-processing algorithms: In-processing algorithms attempt to modify and alter modern learning algorithms to eliminate discrimination during the training process. For example, the algorithms can minimize bias by incorporating changes in the objective function or by imposing a constraint.
 
  • Post-processing algorithms: If the algorithm can only treat the learned model as a black box, without the possibility of modifying the training data or the learning algorithm, then the only option left is to use post-processing algorithms, where the labels originally assigned by the black box model are reassigned in the post-processing phase using a function.
 

The field of algorithmic fairness is a new area of research that needs to be further optimised or refined. Nevertheless, there is already a large body of research proposing fair algorithms and bias mitigation techniques, and comparing different bias mitigation algorithms.

 

Below we give three brief examples of existing work and show how they fall into one of these categories. The main goal of the algorithms is to achieve a model with higher accuracy while ensuring that the models are less discriminatory with respect to sensitive attributes. In simple terms, the output of the classifier should not correlate with sensitive attributes. Building such ML models becomes a multi-objective optimisation problem. The quality of the classifier is measured by its accuracy and the discrimination it makes based on sensitive attributes; the more accurate the better, and the less discriminatory (based on sensitive attributes) the better.

  • Feature modification (Pre-Processing): The algorithm starts from a binary or categorical sensitive attribute. It adjusts the distributions of the features so that they are the same in each sensitive group. The result of applying the algorithm is a modified dataset where each feature in the data has been decoupled from the sensitive attribute. The idea is that a model trained on this data should not be able to learn to discriminate based on the sensitive attributes. Read more: https://dl.acm.org/doi/10.1145/2783258.2783311.
 
 
  • Decision treshold modification (Post-Processing): We have learnt that the fairness definition of equal odds assumes that the TPR and FPR are equal for each sensitive group. The Equal Opportunity fairness definition assumes that only the TPRs are equal. In both cases, this algorithm achieves this by adjusting the thresholds for each group used to determine the prediction. The algorithm is therefore very broadly applicable, as it only requires access to the model output and the protected attribute. Read more: https://arxiv.org/abs/1610.02413.
 

Below we give three brief examples of existing work and show how they fall into one of these categories. The main goal of the algorithms is to achieve a model with higher accuracy while ensuring that the models are less discriminatory with respect to sensitive attributes. In simple terms, the output of the classifier should not correlate with sensitive attributes. Building such ML models becomes a multi-objective optimisation problem. The quality of the classifier is measured by its accuracy and the discrimination it makes based on sensitive attributes; the more accurate the better, and the less discriminatory (based on sensitive attributes) the better.

  • Feature modification (Pre-processing): The algorithm starts with a binary or categorical sensitive attribute. It adjusts the distributions of the features so that they are the same in each sensitive group. The result of applying the algorithm is a modified dataset where each feature in the data has been decoupled from the sensitive attribute. The idea is that a model trained on this data should not be able to learn to discriminate based on the sensitive attributes. Read more: https://dl.acm.org/doi/10.1145/2783258.2783311.
 
 
  • Decision treshold modification (Post-processing): We have learnt that the fairness definition of equal odds assumes that the TPR and FPR are equal for each sensitive group. The Equal Opportunity fairness definition assumes that only the TPRs are equal. In both cases, this algorithm achieves this by adjusting the thresholds for each group used to determine the prediction. The algorithm is therefore very broadly applicable, as it only requires access to the model output and the protected attribute. Read more: https://arxiv.org/abs/1610.02413.
 

There is no single right answer to how to ensure fairness in an AI system. The “right answers” are constantly evolving as AI fairness is an active area of research. See also Benjamin van Giffen et al. (2022).

Bias mitigation
Improving AI fairness

Develop your skills with practical exercises.

Section 7 / 7

Outlook

7. Future relevance

In this section you will learn how to see and reap the benefits of a future with fair algorithms.

Section 7 / 7

Outlook

7. Future relevance

In this section you will learn how to see and reap the benefits of a future with fair algorithms.

 

You cannot shy away from the challenge of making the use of algorithmic decision systems fair. The deeper you delve into the topic of bias and fairness in AI systems, the more you feel the complexity of the problem. On the philosophical side, there is a discourse about the right definition of fairness. This is a discourse that faces the technical impossibility of considering all theoretical understandings of fairness at the same time, as they are partly mutually exclusive. On the technical side, one is confronted with the challenges of the black box of a complex algorithm: What is actually going on in my algorithm?

The discussion about fair algorithms needs to happen now. Ultimately, if algorithms are fair, they can help us overcome our own biases. Existing discriminatory practices should be exposed as soon as possible and reflection on the underlying decision-making criteria should be initiated. This can stimulate the next steps in the use of algorithms and AI. This includes, for example, a trained, enlightened use of algorithmic decision-making systems.

Bias and fairness are only pieces of the puzzle of trustworthy and ethical AI. In addition to bias and fairness, it also includes privacy and accountability.

Teaching Case

START Foundation:
AI-based "Skill Compass"