Police forces in the United Kingdom pushed for continued use of a facial recognition system known to produce biased and unreliable matches against women, young people, and ethnic minority groups. Internal documents reveal that forces successfully overturned a policy adjustment intended to reduce discriminatory errors, even after government researchers confirmed the bias.
The findings raise new questions about policing transparency, civil liberties, and the ethical boundaries of algorithmic surveillance in criminal investigations.
Bias Was Known for More Than a Year Before Action
A Home Office-commissioned review by the National Physical Laboratory (NPL) warned police chiefs in September 2024 that the algorithm used for retrospective facial recognition searches was significantly more likely to produce incorrect matches for probe images featuring Black individuals, Asian individuals, women, and people under 40.
One short sentence: The problem was clear early on.
The bias affected retrospective searches run through the Police National Database (PND), which stores more than 19 million custody photographs. When officers submit a suspect photo — known as a probe image — the system generates a list of potential matches for investigation. False positives can lead to misdirected policing, wrongful suspicion, and intrusive questioning for innocent individuals.
Police leadership initially ordered that the “confidence threshold” for algorithmic matches be raised so only the strongest digital matches would be returned. Raising the threshold significantly reduced the bias in test conditions and improved fairness.
But there was a side effect: fewer possible leads for investigators.
-
The number of searches producing a potential match fell from roughly 56% to 14% when the higher threshold was used.
The bullet point illustrates why police forces objected.
One-sentence paragraph: Investigators felt that fewer matches meant fewer leads.
Pressure From Police Forces Led to the Rule Change Being Reversed
Within weeks of the new threshold being implemented, forces complained that investigative options had been limited. Documents from the National Police Chiefs’ Council (NPCC) show that forces argued it made the system less useful as a retrospective intelligence tool.
Police leaders asked that the higher threshold be dropped.
And it was. The decision to reduce match confidence levels — even after evidence of discrimination — reflects how operational demand outweighed fairness goals. The Home Office and NPCC have not disclosed the threshold currently being applied.
Short sentence: Transparency has become a major concern.
The algorithm’s misidentification rate at certain settings is stark. According to recent NPL testing, the system could deliver false positives for Black women almost 100 times more frequently than for white women under particular configurations.
Those results are not random or statistical noise. They demonstrate that facial recognition carries unequal risk depending on gender and ethnicity.
The Home Office Says It “Acted on the Findings”—But Not How
When publishing test summaries last week, the Home Office acknowledged that the algorithm behaved differently for different demographic groups. It said that it had “acted on the findings” and insisted that the system was not being used in live deployments without safeguards.
One sentence: Critics want much clearer answers.
Neither the NPCC nor the Home Office has disclosed what operational threshold is now in place or how bias mitigation is being enforced. Officials say disclosure could compromise policing tactics.
Civil liberties groups disagree. They argue that secrecy limits external oversight and potentially exposes the public — especially minorities — to avoidable risk.
A short sentence: Trust requires transparency.
Why Fewer Matches Became a Flashpoint for Police Forces
Retrospective facial recognition is different from live high-street scanning. Officers use it after crimes, thefts, assaults, burglaries, fraud cases or public disorder incidents, pulling images from CCTV, police video, or private surveillance.
Investigators want a long list of potential suspects so they can cross-check names, addresses, past encounters, and other clues. When match volume drops, detectives feel they have fewer leads.
One sentence: Fewer matches mean more legwork.
But critics argue that more leads is not automatically better. If thousands of wrong leads are disproportionately produced for people of color, then the investigative approach itself risks embedding discrimination. False positives can justify inquiries, police visits, questioning, traffic stops, metadata checks, and administrative surveillance.
Short paragraph: That can cause harm long before innocence is established.
Police insist that algorithmic suggestions do not trigger arrests or charges. Matches are treated as intelligence prompts, reviewed by humans, and never used automatically.
Civil rights organisations respond that “intelligence prompts” still influence operational decisions, resource allocation, and suspicion profiling.
The Broader Stakes for Algorithmic Policing
Facial recognition is becoming embedded across policing infrastructures worldwide. Vendors build training data on millions of faces. Police departments classify images into investigative databases. Bias stems from data composition, lighting, camera technology, training imbalance, and model design.
One sentence: Fixing the bias is not simple.
Major academic studies worldwide have shown that facial recognition accuracy tends to be highest for white male faces, especially faces between 30 and 60 years old. Women, darker skin tones, and younger subjects produce higher misidentification rates across multiple commercial and government algorithms.
Raising threshold confidence is the simplest mitigation, because it reduces weak matches. The trade-off is operational. Police must work harder to identify suspects without algorithmic shortcuts.
A one-sentence paragraph: Better accuracy means fewer automated leads.
Accountability and Oversight Remain Unresolved
Campaigners argue that UK policing has entered a phase where algorithmic tools are treated as standard practice with insufficient democratic oversight. Liberty Investigates and the Guardian have reported repeatedly that bulk biometric surveillance is expanding faster than regulation.
Civil liberties groups warn that any system with asymmetric error rates risks amplifying pre-existing policing disparities. Communities already overrepresented in stop-and-search statistics may be most exposed to false suspicion through flawed automated matching.
One-sentence paragraph: Bias and policing history form a dangerous combination.
Even if the Home Office believes its current safeguards are adequate, the absence of disclosed thresholds makes it impossible for independent experts to evaluate the fairness of current deployment.
Transparency advocates are calling for three things: clear reporting of match thresholds, continuous independent bias testing, and judicial or parliamentary oversight before algorithmic tools are expanded further.
Short paragraph: Operational secrecy and civil fairness are currently in tension.
Whether future policy tightens or weakens depends on political appetite, policing pressure, and public scrutiny.








