The FBI’s Massive Facial Recognition Database Raises Concern

The Federal Bureau of Investigation plans to start using a facial database, according to papers obtained by the Electronic Frontier Foundation in a FOIA lawsuit. But facial recognition technology isn't very accurate yet.

Cameron Scott

Apr 27, 2014

Facial recognition technology isn’t yet sophisticated enough to identify people accurately — something which most technology watchers cite as a reason not to be overly concerned, yet, about its privacy implications.

But the Federal Bureau of Investigation plans to start using the software anyway, according to papers obtained by the Electronic Frontier Foundation in a FOIA lawsuit. And when used by law enforcement agencies, imprecise facial recognition algorithms hold at least as much potential for abuse as a terrifyingly precise one, say civil liberties advocates.

The FBI has been building what it calls a Next-Generation Identification system to hold its identification data and, responding to political pressure in the wake of 9/11, to make it sharable between agencies. Civil liberties groups were tense about how more powerful suspect search-engines might affect average citizens, and that’s where the lawsuit came in.

The papers the FBI turned over just last week reveal that it has been making impressive progress toward a facial recognition database with a wide-range of sources. But despite clear evidence that the software generates a lot of false positive and false negative results, the bureau’s progress toward guidelines on how the technology will be used is nowhere near as robust.

The database “may include as many as 52 million face images by 2015,” EFF said in an analysis of the government documents. It already contains more than 16 million images. More than 4 million come from non-criminal contexts, such as driver’s license photos.

But is the software, developed by MorphoTrust (now Idemia), accurate? The FBI’s criteria for accuracy specify that the object of the search will be returned in the top 50 candidates 85 percent of the time. That’s not bad, technically speaking, but for every false negative the algorithm spits out, there’s someone who could end up in an FBI interrogation room.

“We know from researchers that the risk of false positives increases as the size of the dataset increases—and, at 52 million images, the FBI’s face recognition is a very large dataset. This means that many people will be presented as suspects for crimes they didn’t commit. This is not how our system of justice was designed and should not be a system that Americans tacitly consent to move towards,” Jennifer Lynch, a senior staff attorney at EFF, wrote.

Civil liberties groups are displeased that the new database, unlike its predecessors, will draw in photos from civil sources, such as drivers’ licenses. Still, there’s no law prohibiting such merging of datasets, Chris Conley of the ACLU said. It just wasn’t practical before the digital age.

Be Part of the Future

100% Free. No Spam. Unsubscribe any time.

But the FBI has set loose rules about getting photos from sources beyond mug shots and government photo IDs. For instance, the documents say the bureau will get more than 200,000 images from “new repositories” — there’s no clarification of what those repositories may be. A poorly understood information-sharing program between the FBI and local law enforcement bodies would pour another 700,000 images into the database. Photos from social media are off-limits, though there are no procedures identified for enforcing the ban.

While few of us would enjoy being brought in for interrogation, law enforcement would be lax not to take use of powerful technologies like facial recognition at all. A few years from now, surveillance photos like those of the Boston Marathon bombers could lead directly to violent criminals.

So what’s the right way to use the technology?

“As a starting point with any technology, the goals should be in place and the system should be designed around those. We get concerned when the process is ‘let’s collect lots of data and figure out how to use it,’ without those specific guidelines in place,” Conley said.

Curiously, Conley's advice sounded a lot like common tech industry ground rules for web development projects.

Photos: Mikael Altemark/Flickr, State of Illinois, MorphoTrust

Big Data

Cameron Scott

Cameron received degrees in Comparative Literature from Princeton and Cornell universities. He has worked at Mother Jones, SFGate and IDG News Service and been published in California Lawyer and SF Weekly. He lives, predictably, in SF.

The First AI‑Designed Vaccine Has Been Tested in People. Here’s What Happened.

Neil Mabbott

Jul 07, 2026

Forget Code: AI Is Learning to Hack Society

Edd Gent

Jun 29, 2026

Collage of torn up magazine pages with words on them

AI Collapses on a Classic Psychology Test. What It Reveals Could Stall Human-Level AI.

Shelly Fan

Jun 23, 2026

Biotechnology