Clearview Forbids Users From Scraping Its Database Of Images It Scraped From Thousands Of Websites

from the don't-scrape-me-bro dept

Clearview continues to dominate the “Most Hated” category in the facial recognition tech games. And with Amazon tossing aside its “Rekognition” program for the time being (it’s spelled with a K because the AI tried to spell “recognition” correctly and failed), Clearview has opened up what could be an insurmountable lead.

Clearview has been sued, investigated, banned by law enforcement agencies, and suffered numerous self-inflicted wounds. Underneath Clearview’s untried and untested AI lies an underbedding composed of the internet. The ~4 billion images in Clearview’s database have been scraped from public posts and accounts hosted by thousands of websites and dozens of social media platforms.

There’s nothing inherently wrong with scraping sites to make use of information hosted there. In fact, this often controversial power can sometimes be used for good. The last thing we need is Clearview’s questionable tech convincing legislators, prosecutors, and courts that scraping sites is something only criminals do.

Clearview called out Google’s apparent hypocrisy on the subject of site scraping when Google sent a cease-and-desist demanding it stop harvesting images and data from Google’s online possessions. But Clearview is apparently unable to recognize its own hypocrisy. While it’s cool with site scraping when it can benefit from it, it frowns upon others perpetrating this “harm” on its own databases.

Eerily reminiscent of Disney’s take on the public domain (good when Disney uses it, bad when Disney’s copyrights are set to expire) is Clearview’s take on site scraping. Its user agreement [PDF] with the Evansville, Indiana police department (obtained by MuckRock user J Ader) contains this paragraph:

The use of automated systems or software to extract the whole or any part of the Service or Website, the Information or data on or within the Service or the Website, including image search results or source code, for any purposes (including uses commonly known as “scraping”) is strictly prohibited.

Pretty sure a bunch of the sites scraped by Clearview have similar clauses in their terms of use. And if Clearview doesn’t believe those terms should be honored, it shouldn’t expect others to give it the respect it refuses to extend to others. I don’t think anyone else should necessarily be in possession of everything in Clearview’s facial recognition database but I do think someone needs to scrape the shit out of it on sheer principle.

Also bundled in this package of public records is Clearview’s laughable “accuracy” test. It compares itself to Rekognition and its highly publicized failure. When Amazon’s tech was tested, it misidentified several DC legislators as criminals, especially those that weren’t white and male.

Clearview touts its own success in this document [PDF], which covers a non-independent test of its AI performed in 2019. Here are the results:

The test compared the headshots from all three legislative bodies against Clearview’s proprietary database of 2.8 billion images (112,000 times the size of the database used by the ACLU). The Panel determined that Clearview rated 100% accurate, producing instant and accurate matches for every one of the 834 federal and state legislators in the test cohort.

LOL. This is proof of nothing. Anyone with access to a reverse image search could perform this test with the same accuracy. While Amazon’s AI was tested against arrestees’ mugshots, Clearview’s was tested against photos and info scraped from social media profiles and public websites. Of course it was able to positively identify politicians, most of whom maintain multiple social media accounts and websites. It would only be notable if the AI had failed to perform this simple task given the wealth of information it had to work with.

In conclusion, Clearview sucks. Its tech is unproven and its policy on scraping is the apex of hypocrisy. On the other hand, the company seems to be harvesting criticism as fast as its harvesting web content, so the prognosis on its continued survival remains refreshingly bleak.

Filed Under: , ,
Companies: clearview

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “Clearview Forbids Users From Scraping Its Database Of Images It Scraped From Thousands Of Websites”

Subscribe: RSS Leave a comment
9 Comments
tp (profile) says:

Clearview's scraping policy is just ok...

It’s completely different matter of scraping internet, where every content item belongs to/is owned by different entity. Each owner can sue you for peanuts, and get $100 from you.

But if you scrape content from single owner, that’s huge copyright infringement. That one owner can sue you for $2 million bucks.

That’s basically the reason why scraping from single owner is completely forbidden but scraping from internet is semi-legal.

teka says:

Re: Clearview's scraping policy is just ok...

But Clearview is not the owner.

They publicly admit that they scavenge the data from across the web, in violation of all those other places’ terms of service(in fact it is a main selling point of the service), but are turning around at their own gate and claiming the moral high ground, which is silly.

tp (profile) says:

Re: Re: Clearview's scraping policy is just ok...

But Clearview is not the owner.

Of course it’s the owner. Someone needs to take responsibility of all problems that their copyrighted content collection is causing, and only way to attach that responsibility to Clearview is via giving them ownership rights to the collection. If it contains someone elses work, and there’s other owners involved, then Clearview needs to have operation for asking the owner’s permission. But responsibility cannot be attached to Clearview unless we also give them ownership rights to whatever content they’re collecting.

Add Your Comment

Your email address will not be published. Required fields are marked *

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Ctrl-Alt-Speech

A weekly news podcast from
Mike Masnick & Ben Whitelaw

Subscribe now to Ctrl-Alt-Speech »
Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...
Loading...