How AI Can Lead To False Arrests & Wrongful Convictions

from the hallucinating-your-arrest dept

This article is republished from The Conversation under a Creative Commons license. Read the original article.

In Baltimore County, Maryland on Oct. 20, 2025, a 17-year-old student named Taki Allen was sitting outside his high school after football practice when an artificial intelligence-enhanced surveillance camera falsely identified the Doritos bag in his pocket as a gun. Within moments police cars arrived, officers drew their weapons and Allen was forced to his knees and handcuffed while they searched him. All they found was a crumpled bag of chips. The AI’s misidentification and the human decisions that followed turned a normal evening into a traumatic confrontation.

On Dec. 24, 2025, Angela Lipps, a Tennessee grandmother, was released after spending five months in jail because facial recognition software had incorrectly connected her to fraud crimes in North Dakota, a state she had never visited. Police had arrested her at gunpoint while she was babysitting her four grandchildren.

These are unfortunate examples of how AI can lead to mistreatment of people because of technical flaws as well as misplaced human faith in the technology’s supposed objectivity. These cases involve different tools, but the underlying issue is the same. AI systems produce probabilities, and people treat them as certainties.

We are researchers who study the intersection of technology, law and public administration. In researching how police departments use AI and how digital technologies operate in a democratic society, we have seen how quickly the shift from probabilistic prediction to operational certainty happens in practice.

AI policing tools are used in dozens of U.S. cities, although no public registry tracks the full footprint. The tools ingest historical crime data and score neighborhoods on predicted risk so officers can be routed toward the resulting hot spots. The mechanism is straightforward, but its consequence is not. Once a system signals a possible threat, the question is no longer how certain the prediction is but what to do about it. A statistical output turns into a deployment decision, and the uncertainty that produced it gets lost on the way.

A matter of probabilities

When generative AI models such as ChatGPT or Claude respond to human requests, they are not searching a database and pulling out facts. They are predicting the most likely answer based on patterns in data they have been trained on. When asked, “Who invented the light bulb?” the models do not go to a source or fact-check a finding. They generate a statistically probable answer which is “Thomas Edison.” The reply might be right, but it might not capture the full story – such as Joseph Swan’s parallel invention at the same time as Edison’s. The danger arises when people believe that the model is retrieving truth rather than generating likelihoods.

This distinction matters. The most probable response is not the same as a factually verified answer, complete with context.

Police handcuffed teenager Taki Allen at gunpoint after an AI camera system incorrectly indicated he had a gun.

This reality can be highly problematic for policing and law. For example, when law enforcement agencies use AI systems trained on geographical data to estimate where criminal activity is likely to occur, the algorithms analyze historical crime data and geographic patterns. These systems generate statistical risk scores or heat maps for locations based on prior incidents. But such predictions may have little bearing on who was involved in a new crime in the area, even if an algorithm generates information that sounds authoritative.

Some researchers have argued that predictive policing systems do not increase the likelihood that racial minorities will be arrested more often relative to traditional policing practices. The broader concern, however, is not limited to measurable disparities in arrest outcomes alone. It is about how probabilistic predictions can become standardized operational decisions absent further verification.

Artificial intelligence researchers caution against using these models in isolation for crime and legal proceedings or decision-making. Research at the University of Virginia’s Digital Technology for Democracy Lab with police chiefs shows that some law enforcement groups follow strict policies that dictate when technology is used in tandem with, or in place of, human discretion, while others have no such policy.

What most users do not realize is that AI systems rarely produce binary answers: yes or no, a positive identification or a negative one. They generate probabilities. Some systems assign scores that assess the system’s confidence in a prediction. In those cases, engineers set a confidence threshold, a level of certainty that determines when the system should trigger an alert about a possible threat. You can think of this threshold as settings on a control knob. A 95% confidence level, for example, indicates that the model considers its interpretation to be highly likely.

A low threshold catches more potential threats but increases false alarms. A high threshold reduces mistakes but risks missing real dangers. Either way, these algorithmic thresholds are often invisible to the public and are set quietly by vendors or agencies, even though they shape when police action begins.

Angela Lipps was unjustly jailed for more than five months based on a mistake by a facial recognition system.

Where to draw the line

In medicine, these kinds of trade-offs are explicit. Diagnostic tools are calibrated on the relative harm of different errors. In infectious disease settings, for instance, systems that detect infections are often designed to accept more false positives to avoid missing contagious individuals. Then medical professionals look into the human cases. And the algorithm-based decisions are subject to professional standards, ethics reviews and regulatory oversight.

In policing, an AI system must balance false positives, where the system flags a threat that does not exist, and false negatives, where it fails to detect a real danger. The trade-off carries significant consequences. A lower threshold may generate more alerts and allow officers to intervene earlier, but it also increases the risk of mistaken identifications, which happened to Angela Lipps, or escalated encounters like the one Taki Allen experienced. A higher threshold may reduce wrongful interventions but could allow legitimate threats to go undetected.

Some law enforcement agencies argue that acting on imperfect signals is preferable to missing serious risks. But lowering the bar for algorithmic alerts based on probabilistic estimates effectively expands the number of people subjected to police attention. It is important to realize that these thresholds are not neutral features of the technology; they are choices embedded by the creators in the model’s code. Decisions about where to draw the line determine when an algorithmic suspicion becomes a real-world police action, even though the public rarely sees or debates how those thresholds are set.

Limits of optimization

Developers often use several methods to determine where to set a confidence threshold. Techniques such as “receiver operating characteristic curve analysis” examine how changing the threshold for an alert alters the balance between correctly identifying real events and mistakenly flagging harmless ones. Precision–recall analysis examines a similar trade-off, asking how accurate the system’s alerts are relative to the number of incidents it successfully detects.

These approaches could help calibrate systems more responsibly by testing how often an algorithm wrongly flags people or locations. Fine-tuning can improve system performance. But the techniques cannot resolve the underlying question of how much algorithmic uncertainty society is willing to tolerate.

In law, legal standards of proof determine how convincing evidence must be before a judge or jury can rule in favor of a plaintiff or defendant. Courts use formal standards of proof depending on the stakes, such as probable causepreponderance of the evidence and beyond a reasonable doubt. These standards reflect a societal judgment about how much uncertainty is acceptable before exercising legal authority. A court does not accept a guess or a prediction; it follows a process to weigh evidence. Unlike humans, an AI model does not usually say, “I’m not sure.” A model typically has confidence in its reply, even when the answer is incorrect.

Stakes are rising as AI enters the courtroom, law enforcement, the classroom, the doctor’s office and the public sector. It is important for people to understand that AI does not know things the way many assume it does. It does not distinguish between “maybe” and “definitely.” That is up to us. We believe that technologists should design systems that admit uncertainty and need to educate users about how to interpret AI outputs responsibly.

Maria Lungu is a Postdoctoral Researcher of Law and Public Administration at University of Virginia and Steven L. Johnson, is Associate Professor of Commerce at University of Virginia

Filed Under: , , ,

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “How AI Can Lead To False Arrests & Wrongful Convictions”

Subscribe: RSS Leave a comment
26 Comments
Anonymous Coward says:

A 95% confidence level, for example, indicates that the model considers its interpretation to be highly likely.

The problem is, a 95% confidence level means there is a 5% chance even in the prediction that you are wrong. Multiply that by hundreds or thousands of predictions a day, and you are going to be wrong several times that day.

… and that assumes that 95% confidence by the model is reflect accurately in the real world. Hope you’ve got a good legal insurance plan.

TKnarr (profile) says:

Re:

Exactly. Scale matters. The 95% confidence level is right down at the bottom of what’s useful for anything. In practice a 99% confidence level is the minimum for reliable predictions, and that only works where true positives are relatively common. The less common a true positive is, the higher the confidence level needed and it’s not uncommon to need 99.998% confidence or higher when true positives are rare.

None of this matters to the cops or the politicians, though, and that’s the problem. The false positives never cause them any negative consequences, so they don’t care about them.

ECA (profile) says:

Re: Another think

IS TO COMPARE the data the computer is using.
Generic faces can match anyone.
The one about the grandmother, A video Showed the Pictures.
The 1 they based it on was 1/3 Side angle from Above.

AND the word here is TRAINED. How do you train your computer. How to Force a Literal mind to NOT focus, to see Other things to use for Ident.
You cant say, What is that, and expect Less then 100 Guess’s. You Cant say its a Gun. when YOU DONT KNOW.

Ehud Gavron (profile) says:

Stupid people do what stupid machines tell them to

“…North Dakota, a state she had never visited. Police had arrested her at gunpoint…”

OH, so she had never been to North Dakota? What exactly did that change? She was ARRESTED AT GUNPOINT. By humans — being stupid because the screen said to do it.

What if she HAD been to ND? What WORSE things would have she been subjected to, and this time the excuse/rationalization/stupid-bait wouldn’t be “But AI” and instead “But AI and she was once in ND!!!”

Stop the rationalizations. It makes ZERO DIFFERENCE if she even knows of the existence of ND. It makes zero difference to her “status of a victim” because her kid had kids (hence her earned title of “grandmother.”)

Here’s the real headline:
STUPID THUG COPS VIOLENTLY ARREST WOMAN FOR NO REASON but blame “AI” — News media quick to point out she’s a grandmother AND has never been to some state.

Kinetic Gothic says:

Re:

Here’s a clue… the fact that she had never been to the state, makes it a physical impossibility that she had committed the crime in question.

If she had been in the state, especially if she had been in the state at the time of the crime, she could be seen as a plausible suspect.

The fact that she hadn’t been to the state demonstrate ps that the cops did nothing to verify, the Facial recognition match, like by checking phine records, bank receipts, licence plate readers, etc.

So, yes.. the fact that she had never been to the state, actually matters a lot.

Anonymous Coward says:

Re: It's a great point

How often you see situations like this, of “Well a grandmother!” and “Never been to North Dakota.” That works for outrage about this one woman. It elides the truth that people who aren’t grandmothers who maybe drove through North Dakota once are swallowed by this dragnet whole, perhaps without even a whisper of their fate.

(For that matter, although certainly I will use every advantage I can as I can, the idea that grandmothers are harmless and only sort of decorative is also actually overall harmful and deeply wrong. Smart older women are powerhouses that are deeply important to society and innovation and if you don’t know that there’s something wrong with you. 🙂 )

Ziggy says:

Probable cause

This piece argued that AI works on probabilities, not certainty. Twue dat. But the standard for arrest is “probable cause.” That’s “probable” as in “probability.”

I understand that crappy AI might have a crappy notion of “probable cause:” one which does not meet legal standards. But cops make honest mistakes sometimes, and the “probable cause” standard is supposed to shield them. How is AI different?

The problem is not in the probable cause standard. Careful cops are sometimes wrong. The problem is that there are no consequences for carelessness or worse–the qualified immunity regime under which we suffer.

Anonymous Coward says:

Re: I think the main thing is that we are using "probable" in different ways here

“Probable” to a careful human who has stakes in getting it right, even just being uncomfortable when they hurt an innocent human, might be on one scale, while the “Probables” that an AI are generating are not careful and not based on factual analysis. Even if both are numbers, you can’t compare them as if they measure the same thing.

AI tools ARE neat when used well. But they don’t have judgement and they’re not pulling facts most of the time. The law, especially American law, is actually very distinctly NOT supposed to be about probability but about the rights of the individual to transcend probability.

It’s not probable that a billionaire will be sent to jail, certainly not by all the facts the AI knows. That is not supposed to mean that they can’t be convicted of any crime they commit.

Add Your Comment

Your email address will not be published. Required fields are marked *

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Subscribe to Our Newsletter

Get all our posts in your inbox with the Techdirt Daily Newsletter!

We don’t spam. Read our privacy policy for more info.

Ctrl-Alt-Speech

A weekly news podcast from
Mike Masnick & Ben Whitelaw

Subscribe now to Ctrl-Alt-Speech »
Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...
Loading...