Are algorithms truly racist or sexist?
We have all seen those news articles about unfair / discriminatory machine learning algorithms. Today, I want to give some context as to why this happens.
Just incase you haven’t seen or heard of any discriminatory algorithm, here are some examples:
- Amazon’s hiring algorithm penalised female candidates applying to technical roles
- Google’s algorithm showed prestigious job adverts to men but not women
- UK A-level results judged students from wealthy backgrounds more favourably
- Florida state justice system was more likely to give stricter sentences to non-white offenders than their white counterparts
- Deep Learning beauty contest judge picked mostly white winners
Firstly, you’ve probably seen a lot of terms used interchangeably when talking about this field. Some of these include; Artificial Intelligence (AI), Machine Learning (ML), Deep Learning (DL), Model and Algorithm. For the sake of this article, you don’t need to understand the difference between all of these terms. Just think of all of them as the elements of a clever machine (code) that will solve a problem. From here on, I will refer to these terms collectively as “the machine”.
The Classifier
Just like we have specialist doctors for different medical areas, we also have different machines that are suited to different problems. One of the most common problems that the machine is applied to is called “classification”. As the name suggests, this is the process of classifying inputs into different groups e.g. “the machine” might look at a mammogram and decide whether the image belongs in the cancerous group or non-cancerous group. It could also look at a person’s financial history and decide if they would re-pay or default on a loan.
By nature, the classification machine is designed to discriminate e.g. separate cancerous vs non-cancerous images. If the machine is unable to do this, we can agree it is useless at what it was designed to do. In order for the machine to accurately classify images it has never seen, it has to learn what cancerous and non-cancerous images look like. The learning process is carried out on a pool of images which have been identified by experts as cancerous or non-cancerous. This pool of data (images) is called the training examples. The machine’s role is to identify patterns (from the training examples) that are mostly present in the cancerous group but absent in the non-cancerous group. In an ideal world, the identified patterns would be present in ALL of the cancerous group but in NONE of the non-cancerous group. However, this is never the case.
A rock and a hard place
In order to build a good machine, a data scientist has to walk a fine line between complexity and simplicity. A complex machine will try to find intricate patterns in the image, the downside here is that it may find patterns which are not relevant to cancer detection. This in turn would lead to a poor detection rate. On the other hand, a simple machine may not make enough linkages between pattern and outcome. Ideally, the data scientist wants to build a machine that can make good generalizations. I.e. a machine that can make broad connections between input patterns and outcome.
Generalization becomes a problem when the characteristics it is based upon are protected and have no relevance to the outcome. If a machine sees a pattern that most people hired into technical roles are men, it could generalise that men are better candidates for technical roles than women. It will therefore penalize female applicants – even suitably qualified female candidates. In this case, the generalisation is based on a characteristic (gender) that has no relevance to the outcome.
What should we do?
Fixing discriminatory algorithms is not about giving the under-privileged group a more favourable outcome than the privileged group. It is about ensuring that qualified members are given the right outcome irrespective of what group they belong to. In plain terms, it is not about Amazon handing out technical jobs to unqualified women but about ensuring that an equally qualified woman is not rejected from a job because of gender.
In a lot of cases of discriminatory algorithms, the problems have been identified because the data is available to everyone. However, in some cases, the problems have been highlighted by the organization themselves. In Amazon’s example, they had the expertise in-house to identify and rectify the problem. The Financial Services industry tends not to have open customer level datasets. This makes it crucial for business leaders in this industry to hire ethically minded Data Science teams. Data scientist who understand the importance of human-in-the-loop machine learning and sources of ethical bias can help reduce the risks to organizations.
In order to fully understand how we can prevent algorithmic bias, we need to understand the elements of the system that can introduce or exacerbate bias. In my next post, I will be writing about the sources of bias and what we can do to prevent it.
You can contact me here.