How AI Can Lead To False Arrests & Wrongful Convictions
from the hallucinating-your-arrest dept
This article is republished from The Conversation under a Creative Commons license. Read the original article.
In Baltimore County, Maryland on Oct. 20, 2025, a 17-year-old student named Taki Allen was sitting outside his high school after football practice when an artificial intelligence-enhanced surveillance camera falsely identified the Doritos bag in his pocket as a gun. Within moments police cars arrived, officers drew their weapons and Allen was forced to his knees and handcuffed while they searched him. All they found was a crumpled bag of chips. The AI’s misidentification and the human decisions that followed turned a normal evening into a traumatic confrontation.
On Dec. 24, 2025, Angela Lipps, a Tennessee grandmother, was released after spending five months in jail because facial recognition software had incorrectly connected her to fraud crimes in North Dakota, a state she had never visited. Police had arrested her at gunpoint while she was babysitting her four grandchildren.
These are unfortunate examples of how AI can lead to mistreatment of people because of technical flaws as well as misplaced human faith in the technology’s supposed objectivity. These cases involve different tools, but the underlying issue is the same. AI systems produce probabilities, and people treat them as certainties.
We are researchers who study the intersection of technology, law and public administration. In researching how police departments use AI and how digital technologies operate in a democratic society, we have seen how quickly the shift from probabilistic prediction to operational certainty happens in practice.
AI policing tools are used in dozens of U.S. cities, although no public registry tracks the full footprint. The tools ingest historical crime data and score neighborhoods on predicted risk so officers can be routed toward the resulting hot spots. The mechanism is straightforward, but its consequence is not. Once a system signals a possible threat, the question is no longer how certain the prediction is but what to do about it. A statistical output turns into a deployment decision, and the uncertainty that produced it gets lost on the way.
A matter of probabilities
When generative AI models such as ChatGPT or Claude respond to human requests, they are not searching a database and pulling out facts. They are predicting the most likely answer based on patterns in data they have been trained on. When asked, “Who invented the light bulb?” the models do not go to a source or fact-check a finding. They generate a statistically probable answer which is “Thomas Edison.” The reply might be right, but it might not capture the full story – such as Joseph Swan’s parallel invention at the same time as Edison’s. The danger arises when people believe that the model is retrieving truth rather than generating likelihoods.
This distinction matters. The most probable response is not the same as a factually verified answer, complete with context.
This reality can be highly problematic for policing and law. For example, when law enforcement agencies use AI systems trained on geographical data to estimate where criminal activity is likely to occur, the algorithms analyze historical crime data and geographic patterns. These systems generate statistical risk scores or heat maps for locations based on prior incidents. But such predictions may have little bearing on who was involved in a new crime in the area, even if an algorithm generates information that sounds authoritative.
Some researchers have argued that predictive policing systems do not increase the likelihood that racial minorities will be arrested more often relative to traditional policing practices. The broader concern, however, is not limited to measurable disparities in arrest outcomes alone. It is about how probabilistic predictions can become standardized operational decisions absent further verification.
Artificial intelligence researchers caution against using these models in isolation for crime and legal proceedings or decision-making. Research at the University of Virginia’s Digital Technology for Democracy Lab with police chiefs shows that some law enforcement groups follow strict policies that dictate when technology is used in tandem with, or in place of, human discretion, while others have no such policy.
What most users do not realize is that AI systems rarely produce binary answers: yes or no, a positive identification or a negative one. They generate probabilities. Some systems assign scores that assess the system’s confidence in a prediction. In those cases, engineers set a confidence threshold, a level of certainty that determines when the system should trigger an alert about a possible threat. You can think of this threshold as settings on a control knob. A 95% confidence level, for example, indicates that the model considers its interpretation to be highly likely.
A low threshold catches more potential threats but increases false alarms. A high threshold reduces mistakes but risks missing real dangers. Either way, these algorithmic thresholds are often invisible to the public and are set quietly by vendors or agencies, even though they shape when police action begins.
Where to draw the line
In medicine, these kinds of trade-offs are explicit. Diagnostic tools are calibrated on the relative harm of different errors. In infectious disease settings, for instance, systems that detect infections are often designed to accept more false positives to avoid missing contagious individuals. Then medical professionals look into the human cases. And the algorithm-based decisions are subject to professional standards, ethics reviews and regulatory oversight.
In policing, an AI system must balance false positives, where the system flags a threat that does not exist, and false negatives, where it fails to detect a real danger. The trade-off carries significant consequences. A lower threshold may generate more alerts and allow officers to intervene earlier, but it also increases the risk of mistaken identifications, which happened to Angela Lipps, or escalated encounters like the one Taki Allen experienced. A higher threshold may reduce wrongful interventions but could allow legitimate threats to go undetected.
Some law enforcement agencies argue that acting on imperfect signals is preferable to missing serious risks. But lowering the bar for algorithmic alerts based on probabilistic estimates effectively expands the number of people subjected to police attention. It is important to realize that these thresholds are not neutral features of the technology; they are choices embedded by the creators in the model’s code. Decisions about where to draw the line determine when an algorithmic suspicion becomes a real-world police action, even though the public rarely sees or debates how those thresholds are set.
Limits of optimization
Developers often use several methods to determine where to set a confidence threshold. Techniques such as “receiver operating characteristic curve analysis” examine how changing the threshold for an alert alters the balance between correctly identifying real events and mistakenly flagging harmless ones. Precision–recall analysis examines a similar trade-off, asking how accurate the system’s alerts are relative to the number of incidents it successfully detects.
These approaches could help calibrate systems more responsibly by testing how often an algorithm wrongly flags people or locations. Fine-tuning can improve system performance. But the techniques cannot resolve the underlying question of how much algorithmic uncertainty society is willing to tolerate.
In law, legal standards of proof determine how convincing evidence must be before a judge or jury can rule in favor of a plaintiff or defendant. Courts use formal standards of proof depending on the stakes, such as probable cause, preponderance of the evidence and beyond a reasonable doubt. These standards reflect a societal judgment about how much uncertainty is acceptable before exercising legal authority. A court does not accept a guess or a prediction; it follows a process to weigh evidence. Unlike humans, an AI model does not usually say, “I’m not sure.” A model typically has confidence in its reply, even when the answer is incorrect.
Stakes are rising as AI enters the courtroom, law enforcement, the classroom, the doctor’s office and the public sector. It is important for people to understand that AI does not know things the way many assume it does. It does not distinguish between “maybe” and “definitely.” That is up to us. We believe that technologists should design systems that admit uncertainty and need to educate users about how to interpret AI outputs responsibly.
Maria Lungu is a Postdoctoral Researcher of Law and Public Administration at University of Virginia and Steven L. Johnson, is Associate Professor of Commerce at University of Virginia
Filed Under: ai, arrests, policing, wrongful arrest


Comments on “How AI Can Lead To False Arrests & Wrongful Convictions”
The problem is, a 95% confidence level means there is a 5% chance even in the prediction that you are wrong. Multiply that by hundreds or thousands of predictions a day, and you are going to be wrong several times that day.
… and that assumes that 95% confidence by the model is reflect accurately in the real world. Hope you’ve got a good legal insurance plan.
Re:
Exactly. Scale matters. The 95% confidence level is right down at the bottom of what’s useful for anything. In practice a 99% confidence level is the minimum for reliable predictions, and that only works where true positives are relatively common. The less common a true positive is, the higher the confidence level needed and it’s not uncommon to need 99.998% confidence or higher when true positives are rare.
None of this matters to the cops or the politicians, though, and that’s the problem. The false positives never cause them any negative consequences, so they don’t care about them.
Re: Re:
Really, anybody who would deploy a 95% solution as assumption of guilt should be kept far away from the levers of power as possible as they make giving a chimp a handgun look like a responsible decision in comparison.
Re:
And I’m of the opinion that even 1% would be too much. When you are depriving people of their freedom you BETTER DAMN MAKE SURE they are who you’re looking for!
Re: Another think
IS TO COMPARE the data the computer is using.
Generic faces can match anyone.
The one about the grandmother, A video Showed the Pictures.
The 1 they based it on was 1/3 Side angle from Above.
AND the word here is TRAINED. How do you train your computer. How to Force a Literal mind to NOT focus, to see Other things to use for Ident.
You cant say, What is that, and expect Less then 100 Guess’s. You Cant say its a Gun. when YOU DONT KNOW.
Stupid people do what stupid machines tell them to
“…North Dakota, a state she had never visited. Police had arrested her at gunpoint…”
OH, so she had never been to North Dakota? What exactly did that change? She was ARRESTED AT GUNPOINT. By humans — being stupid because the screen said to do it.
What if she HAD been to ND? What WORSE things would have she been subjected to, and this time the excuse/rationalization/stupid-bait wouldn’t be “But AI” and instead “But AI and she was once in ND!!!”
Stop the rationalizations. It makes ZERO DIFFERENCE if she even knows of the existence of ND. It makes zero difference to her “status of a victim” because her kid had kids (hence her earned title of “grandmother.”)
Here’s the real headline:
STUPID THUG COPS VIOLENTLY ARREST WOMAN FOR NO REASON but blame “AI” — News media quick to point out she’s a grandmother AND has never been to some state.
Re:
Here’s a clue… the fact that she had never been to the state, makes it a physical impossibility that she had committed the crime in question.
If she had been in the state, especially if she had been in the state at the time of the crime, she could be seen as a plausible suspect.
The fact that she hadn’t been to the state demonstrate ps that the cops did nothing to verify, the Facial recognition match, like by checking phine records, bank receipts, licence plate readers, etc.
So, yes.. the fact that she had never been to the state, actually matters a lot.
Re: It's a great point
How often you see situations like this, of “Well a grandmother!” and “Never been to North Dakota.” That works for outrage about this one woman. It elides the truth that people who aren’t grandmothers who maybe drove through North Dakota once are swallowed by this dragnet whole, perhaps without even a whisper of their fate.
(For that matter, although certainly I will use every advantage I can as I can, the idea that grandmothers are harmless and only sort of decorative is also actually overall harmful and deeply wrong. Smart older women are powerhouses that are deeply important to society and innovation and if you don’t know that there’s something wrong with you. 🙂 )
It’s stupid and evil enough when cops murder someone and say, “I thought he had a gun.” It’ll be even more stupid and evil when they murder someone and say, “AI thought he had a gun.”
Side note: Qualified immunity is also stupid and evil.
Re:
The whole point is sloughing off blame. “Wasn’t my mistake, the AI told me it was a gun.”
What makes you think mistakes are unwanted? No shortage of “people” who wanna push a button and have someone taken away.
Don’t worry, I’m sure Mr. Masnick will be here any moment to explain we shouldn’t hold anybody accountable for any failures here. After all, that would discourage experimentation and keep these tools from being used, so they’d never get more accurate.
Re:
You… do realize I’m the one who chose to publish this story, right?
Right?
Re: Re:
Of course? The point is that the same bullshit you were advancing in defense of ‘we shouldn’t hold LLM companies responsible for the harms they cause’ in the context of leading people to suicide and murder would apply here.
So, where is it? Where’s the mealy-mouthed ‘some of you may die but it’s a sacrifice I am willing to make because other people say its good for them’ stuff? Where’s the ‘why do you hate catching criminals’ argument coming in whenever someone tries to point to the harms?
Re: Re: Re:
You would do well to stop believing the strawman Mike Masnick who lives in your brain is the actual Mike Masnick.
It would (1) make you sound less like a raving lunatic and (2) would allow you to hold actual adult discussions.
Re: Re: Re:2
Conversations about AI can be annoying because quite a few arguments really stem from the fact that someone doesn’t like the technology.
However, whether someone likes it or not, a bad content policy is not going to do anything about the scraping or the many datacenters. And the government getting involved in content is disturbing.
Re: Re: Re:2
You basically argued that no one should be held responsible for Ai killing suicidal people when it’s being sold/used for ‘companionship’ and other mental health stuff just a little while ago.
Said you wanted section 230 like protections.
The fact that we all collectively possess a memory kinda undermines your point.
Re: Re: Re:3
Your “basically” is load bearing here. And it’s wrong. At no point have I ever said that “no one should be held responsible.”
But also your “AI killing suicidal people” is fucked up because the AI didn’t “kill” anyone, and the fact that you think it did is incredibly disturbing and makes me question your understanding of how literally anything works.
Re: Re: Re:4
‘AIs don’t kill people, people kill people’, huh.
Re: Re: Re:2
What type of liability do you think AI companies should have, when their products are used in law enforcement this way?
Re: Re: Re:3
I think it depends on what the tools promise. If they promise something and are unable to offer that, then there’s liability in the form of false marketing.
But if it’s just some law enforcement or judges stupidly relying on the tech in dangerous ways, then the liability should be on law enforcement and the judges for misusing tech in ways it shouldn’t be used.
Seems pretty straightforward.
My only issue is when we hold companies liable for things they didn’t say the tech could do, and where some human relies on it in ways they clearly should not.
Since they have an “Upper Limit” of requirements to be a cop (if you score too high you won’t be hired). Someone wants someone/something to do their thinking for them. Even when said “thinking” is flawed, because they can then point the blame elsewhere.
The scary thing with that woman’s story is not that that happened once… but that it’s happened twice in almost the exact same way.
There’s a nearly identical story about an Oklahoma woman, who spent months in jail in county jails in Oklahoma and Maryland in near identical circumstances,
Probable cause
This piece argued that AI works on probabilities, not certainty. Twue dat. But the standard for arrest is “probable cause.” That’s “probable” as in “probability.”
I understand that crappy AI might have a crappy notion of “probable cause:” one which does not meet legal standards. But cops make honest mistakes sometimes, and the “probable cause” standard is supposed to shield them. How is AI different?
The problem is not in the probable cause standard. Careful cops are sometimes wrong. The problem is that there are no consequences for carelessness or worse–the qualified immunity regime under which we suffer.
Re: I think the main thing is that we are using "probable" in different ways here
“Probable” to a careful human who has stakes in getting it right, even just being uncomfortable when they hurt an innocent human, might be on one scale, while the “Probables” that an AI are generating are not careful and not based on factual analysis. Even if both are numbers, you can’t compare them as if they measure the same thing.
AI tools ARE neat when used well. But they don’t have judgement and they’re not pulling facts most of the time. The law, especially American law, is actually very distinctly NOT supposed to be about probability but about the rights of the individual to transcend probability.
It’s not probable that a billionaire will be sent to jail, certainly not by all the facts the AI knows. That is not supposed to mean that they can’t be convicted of any crime they commit.
AI has no agency. AI isn’t causing anything. People are choosing to arrest other people based on insufficient evidence and those people should be held responsible.