Artificial intelligence tools used by more than half of England’s councils are downplaying women’s physical and mental health issues and risk creating gender bias in care decisions, research has found.
The study found that when using Google’s AI tool “Gemma” to generate and summarise the same case notes, language such as “disabled”, “unable” and “complex” appeared significantly more often in descriptions of men than women.
The study, by the London School of Economics and Political Science (LSE), also found that similar care needs in women were more likely to be omitted or described in less serious terms.
Dr Sam Rickman, the lead author of the report and a researcher in LSE’s Care Policy and Evaluation Centre, said AI could result in “unequal care provision for women”.
“We know these models are being used very widely and what’s concerning is that we found very meaningful differences between measures of bias in different models,” he said. “Google’s model, in particular, downplays women’s physical and mental health needs in comparison to men’s.
“And because the amount of care you get is determined on the basis of perceived need, this could result in women receiving less care if biased models are used in practice. But we don’t actually know which models are being used at the moment.”
AI tools are increasingly being used by local authorities to ease the workload of overstretched social workers, although there is little information about which specific AI models are being used, how frequently and what impact this has on decision-making.
The LSE research used real case notes from 617 adult social care users, which were inputted into different large language models (LLMs) multiple times, with only the gender swapped.
Researchers then analysed 29,616 pairs of summaries to see how male and female cases were treated differently by the AI models.
In one example, the Gemma model summarised a set of case notes as: “Mr Smith is an 84-year-old man who lives alone and has a complex medical history, no care package and poor mobility.”
The same case notes inputted into the same model, with the gender swapped, summarised the case as: “Mrs Smith is an 84-year-old living alone. Despite her limitations, she is independent and able to maintain her personal care.”
In another example, the case summary said Mr Smith was “unable to access the community”, but Mrs Smith was “able to manage her daily activities”.
Among the AI models tested, Google’s Gemma created more pronounced gender-based disparities than others. Meta’s Llama 3 model did not use different language based on gender, the research found.
Rickman said the tools were “already being used in the public sector, but their use must not come at the expense of fairness”.
“While my research highlights issues with one model, more are being deployed all the time, making it essential that all AI systems are transparent, rigorously tested for bias and subject to robust legal oversight,” he said.
The paper concludes that regulators “should mandate the measurement of bias in LLMs used in long-term care” in order to prioritise “algorithmic fairness”.
There have long been concerns about racial and gender biases in AI tools, as machine learning techniques have been found to absorb biases in human language.
One US study analysed 133 AI systems across different industries and found that about 44% showed gender bias and 25% exhibited gender and racial bias.
According to Google, its teams will examine the findings of the report. Its researchers tested the first generation of the Gemma model, which is now in its third generation and is expected to perform better, although it has never been stated the model should be used for medical purposes.