This paper is in 2 parts. Part 2 focuses on the different types of recruitment tools using AI and their pros and cons in terms of bias.

Part 1 (link) sets out how AI systems are created and where bias can be introduced into the AI model.  


That recruitment bias exists is undeniable, in some cases employers actively use affirmative bias as a counter balance to previous bias, which is a breach of the Equality Act. When evaluating AI in hiring tools (Algorithmic Hiring Tools or AHTs) it is important to identify, quantify and source the bias in the system, before deciding whether the AHT presents a better alternative to the current practice. However it is rare for the algorithm itself to have bias designed into it, bias is normally introduced into an AHT by the data used to train the model in the first place



The recruitment funnel has a number of discrete filters through which the successful jobseeker passes in order to be selected, the majority of candidates remaining in the funnel are discarded at each of these filtering points.  For the purposes of this paper we are looking how bias affects the stages (1) the creation of the candidate pool, (2) selecting the applicants to be interviewed and (2a) video interviewing systems, as AHTs are designed to automate these stages.    

Sourcing recruits using ad placement algorithms micro-targeting on social media platforms introduces discrimination into the recruitment process, for when the algorithm decides the person does not qualify – they remain unaware of the opportunity.  The bias is introduced via incomplete datasets of historic information.  These tend to re-enforce existing stereotypes (see below).   Personalised jobs boards designed to learn recruiters’ preferences to solicit a certain outcome introduce a similar bias.

CV parsing, and tools that predict,  based upon past screening decisions,  can reflect those patterns that recruiters are actively trying to change through diversity and inclusion.   Using tenure, productivity and performance often reflects subjective evaluations which is a source of discrimination within workplaces.

Video interviewing, relying on audio and facial analysis are prone to sampling bias, Google’s speech recognition software is 70% more likely to recognize male speech, because that is what it has been trained on.  How people communicate anger, disgust, fear, happiness, sadness, and surprise varies substantially across cultures, situations, and even across people within a single situation.


Restriction Discrimination - Who is Aware of the Vacancy when:-

Using Targeted Online Ads. AHTs are being used to source candidates via algorithmic ad placement.   While they hold out the promise of attracting the right sort of candidate, in fact their KPI is to attract the highest number of clicks. This can lead to job ads being delivered in a way that reinforces gender and racial stereotypes when the recruiters have no such intent. 

In a Harvard Business Review research paper in 2019 they found that broadly targeted ads using ad placement algorithms on Facebook for (1) supermarket cashier positions were shown to an audience of 85% women, while (2) jobs with taxi companies went to an audience that was approximately 75% black.

Using Personalised Jobs Boards.  These aim to automatically learn recruiters preferences and use those predictions to solicit similar applicants.  These systems mine user behaviour to find and repeat patterns updating their predictions dynamically as jobseekers and recruiters interact. These can be vulnerable to small and incomplete data sets leading to proxies for characteristics (that may be overly represented in the small sample, but would become insignificant in a larger data set)  and repeat that pattern. 


Once applications start flowing in, employers seek to focus on the strongest candidates. While algorithms used at this stage are often framed as decision aids for hiring managers, in reality, they can automatically reject a significant proportion of candidates, and ‘lose’ their CV’s in a digital black hole.

Some of these screening algorithms are simply old techniques dressed up in new technology. Employers have long asked “knockout questions” to establish whether candidates are minimally qualified; now, chatbots and CV parsing tools perform this task. Other tools go further, using machine learning to make predictions based on past screening decisions, saving employers time and, purportedly, minimising the effect of human prejudice. At first glance, it might seem natural for screening tools to model past hiring decisions. But those decisions often reflect the very patterns many employers are actively trying to change through diversity and inclusion initiatives.

Other selection tools incorporate machine learning to predict which applicants will be “successful” on the job, often measured by signals related to tenure, productivity, or performance.   It should be noted that the act of differentiating high performers from low performers often reflects subjective evaluations, which is a notorious source of discrimination within workplaces, when the underlying performance data is polluted by lingering effects of sexism, racism, or other forms of structural bias.  For example if an employer has never hired someone from a certain group, would the algorithm know how to evaluate such candidates effectively?


With video interviewing systems 500,000 data points can be collected during a 30 minute interview containing as little as 6 questions (from a list provided by the vendor) which can then be analysed.  Facial movements can make up to 30% of the person’s score.  Bias can be introduced in 2 ways.

First training data is a technical challenge to build in-house and is expensive, vendors normally purchase this data from 3rd parties which may result in sampling bias.  An assessment of five commercial speech recognition tools – developed by Amazon, Apple, Google, IBM and Microsoft – found racial disparities in performance for African Americans as a result of insufficient audio data from this group when training the models.  Google’s speech recognition software is 70% more likely to accurately recognise male speech because that is what it has been trained on.   It performs poorly and is likely to mischaracterise people with regional and non-native accents.

Secondly as training data are manually assigned class labels (by humans), this process is unavoidably subjective.  Classifications are not neutral and may be open to debate.  Evidence suggests that it is not possible to reliably and accurately identify and label the variety of cross-cultural expressions of emotion and affect.  As Barrett et al states: ‘how people communicate anger, disgust, fear, happiness, sadness, and surprise varies substantially across cultures, situations, and even across people within a single situation’.

0 CommentsClose Comments

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Copyright  © 2019  –  2024 Talent Recognition

Newsletter Subscribe

Get the Latest Posts & Articles in Your Email

[mc4wp_form id="517"]

We Promise Not to Send Spam:)