This paper is in 2 parts. Part 1 (below) sets out how AI systems are created and where bias can be introduced into the AI model.
Part 2 (link) focuses on the different types of recruitment tools using AI and their pros and cons in terms of bias.
That recruitment bias exists is undeniable, in some cases employers actively use affirmative bias as a counter balance to previous bias, which is also a breach of the Equality Act. The new generation of Algorithmic Hiring Tools (AHTs) that are clean of bias, could provide recruiters with a solution that could both clean up recruitment bias, and, improve their selections.
Bias exists – even in random number generators – in all but one system (Pi). When evaluating AI in AHTs it is important to identify, quantify and source the bias in the system, before deciding whether the AHT presents a better alternative to the current practice.
Because the algorithms in AHTs are proprietary it is often not possible to examine them for bias, assuming one knows how to identify it. However it is rare for the algorithm itself to have bias designed into it, bias is normally introduced into an AHT by the data used to train the model in the first place.
While much is made of the ‘Skynet’ potential of AI in the press, attention is quite rightly being drawn to the possibility of bias infecting AI algorithms. The word ‘bias’ is accurate in the etymological sense when used in the context of statistics where it means ‘to distort’, however it is also emotive, immediately triggering memories of unfair treatment based on ethnicity, gender, background and neuro-diversity.
Whereas the word itself is clearly understood, what is less well known is the ways in which bias can be introduced in AI, and what its affects could be. We will be examining the accusations of bias in hiring tools and whether the criticism are valid, and if valid how they can be mitigated.
How Are AI Algorithms Generated
The first point to note is that AI algorithms are created by feeding in potentially huge amounts of data into very powerful processors to learn patterns and relationships, the algorithms are fine tuned using iteration to reduce the error rate. Think of the Infinite Monkey Theorem – provide enough monkeys with typewriters and enough time and eventually they will produce the works of Shakespeare – its not the same but not too different either.
There is Generative AI (which focuses on creating original and novel content) and Predictive AI (which aims to forecast future outcomes based on historical data patterns). Nevertheless both forms of AI require large amounts of data, and that is where bias can creep in.
Discrimination by Restriction
The use of pattern matching to make informed decisions on optimising advertising campaigns results in the message being made available to the group whose characteristics the system believes will be interested often referred to as ad placement. For example an online retailer will want to avoid displaying male orientated products to an audience of females and vice versa.
Digital Black Holes
Data management can be a source of discrimination, in a typical recruitment where there are 40 applicants for 1 vacancy, a great deal of data needs to be stored on the 39 unsuccessful applicants. Even after the vacancy has been filled there may be the need to retain data on a number of the unsuccessful applicants. Algorithms control what data is available to recruiters at what stage of the process, data on discarded applicants often disappears into a digital black hole.
Building AI algorithms relies on data being fed into the model. If the data has bias then the algorithm will be biased. To understand whether model has bias, the nature of that bias as well as its impact, one needs to understand the types of bias that can creep into the data.
- Dataset Sizes. For bias to be minimized all groups need to be equally represented, smaller data sets are more likely to have bias as certain groups will be either over or under-represented. A dataset of employee information from a company with 100 employees is not going to be as representative as the dataset from a company with 100,000 employees. Cost is a major factor when building AI systems and in order to keep costs down model builders buy in datasets rather than create their own.
- Dataset Sources. The cost of building datasets is a major factor, which is why they are often ‘bought-in’. The same source will resell their data to model builders over time, and any bias within the data will be transmitted to these models.
- How False Negatives are Dealt With. Every system is bound to generate false negatives (a result which wrongly indicates that a condition does not hold). So while the use of training data is designed to allow the algorithms to evolve so that they become more accurate, and in the process reducing the possibility of incorrect readings, it cannot be 100% successful. Elimination of bias requires an analysis of how false positives and negatives are dealt with.
- The Historical Context Problem. Amazon built a predictive AI system to allow it to identify more suitable applicants for engineers, and used data culled from CVs of engineers. Because the data was taken from a time when women and minorities were under-represented, the dataset contained a sampling bias towards white males of a certain age. This bias was reflected in the AI model.
- Incomplete Data. When data on a group is less complete than data on another or other groups
When the design of the study itself systematically skews the results leading to misleading conclusions, for example when the study is set up to substantiate a pre-determined conclusion. Another example of design bias is when the developers own social attitudes are incorporated into the model.
Bias in Video Interviewing Systems
Vendors often purchase facial and speech analysis as a service from third parties without any controls on the quality or diversity of the data. An assessment of five commercial speech recognition tools – developed by Amazon, Apple, Google, IBM and Microsoft – found racial disparities in performance for African Americans as a result of insufficient audio data from this group when training the models. Google’s speech recognition software is 70% more likely to accurately recognise male speech because that is what it has been trained on. It performs poorly and is likely to mischaracterise people with regional and non-native accents.
As Barrett et al states: ‘how people communicate anger, disgust, fear, happiness, sadness, and surprise varies substantially across cultures, situations, and even across people within a single situation.
Practises that are fair in form but discriminatory in operation. A good example is favouring minorities when recruiting in order to offset previous discrimination, a form of reverse or positive discrimination. In the US this was permissible under their DE&I rules, where the ‘E’ stood for equity until a recent ruling against Harvard Business School favouring applicants from minorities, whereas in Europe this is not legal under ED&I where the ‘E’ stands for equality.