Search About Newsletters Donate
Support independent, nonprofit journalism.

Become a member of The Marshall Project during our year-end member drive. Our journalism has tremendous power to drive change, but we can’t do it without your support.

News

Is Google More Accurate Than the FBI?

In tracking deaths by police, the tech world might beat Uncle Sam.

When escaped murderer Richard Matt was shot by a U.S. Customs and Border Protection officer on June 26, he became the 459th person shot and killed by a law enforcement this year, according to a new database released Tuesday by the Washington Post. The Post has created a catalog of deaths from police-involved shootings by continually “culling local news reports and monitoring independent databases such as Killed by Police and Fatal Encounters.” They have also invited their readers to contribute to and correct their information. As of today, their database contains 463 names, as well as information about victims’ ages, race, gender, the city in which they were killed, whether they were armed, whether they showed signs of mental illness, and the date of their death.

In addition to Richard Matt, the Washington Post database contains other names you might recognize. Walter Scott — who was shot in the back in April while running from a South Carolina police officer — is number 253. But other people are absent, like Freddie Gray, who died of injuries sustained while handcuffed and shackled in a police van. His name is not in the database because the Washington Post tracks only shooting deaths involving on-duty officers.

Gray does, however, have a number in a similar database (it’s 421). The Guardian launched The Counted in June, which tracks all “deadly use of force” by law enforcement. They have a similar approach as the Washington Post in that they follow media reports and leverage existing independent tracking projects. But because they count all police-involved deaths — not just shootings — their total is higher. The Guardian estimates that there have been 550 police-involved deaths since the beginning of the year. For each death, they track similar information as the Washington Post, but include the location of the incident and the law enforcement agency involved, and exclude the possibility of mental illness.

Why the Numbers Differ

The Washington Post and the Guardian have developed their own methods for counting police-involved deaths in part because no other organization has precise tallies. The FBI, for its part, requests that law-enforcement agencies submit a count of police-involved deaths as part of its Uniform Crime Reports (UCR), the official data source on crime in the U.S. The FBI aggregates these counts and publishes estimates of justifiable homicides by police officers in its Supplementary Homicide Report. But the numbers are widely seen as incomplete because law-enforcement agencies contribute to the report voluntarily. Only about 2,700 out of 22,000 agencies recognized by the FBI contributed counts in 2013, the most recent year for which data is available. That year, the FBI tallied 461 homicides (its annual number has been around 400 for the last five years).

The Center for Disease Control and Prevention also tracks deaths due to “legal intervention.” Through the National Violent Death Reporting System, it aggregates information from state authorities on a voluntary basis. Currently only 32 states participate in the program; California, Texas, and Florida — the three most populated states in the nation — do not. In 2013, the CDC estimated that there were 516 police-involved homicides in the U.S.

By year’s end, both the FBI and the CDC numbers stand to be about half the number estimated by both the Guardian and the Washington Post.

Why Google Alerts Get Better Numbers

The Guardian and the Washington Post both rely heavily on two websites, Fatal Encounters and Killed by Police, in compiling their databases. In turn, both Fatal Encounters and Killed by Police rely on something commonly thought of as a vanity tool — Google Alerts — which allow people to set up custom searches and receive email alerts when those searches return new findings.

“We use Google Alerts and search by date and region, usually states,” said D. Brian Burghart, founder of Fatal Encounters. Burghart, who is also editor and publisher of the Reno News & Review, started his site as a massive public records search: he has made over 2,000 requests, producing complete sets of police reports about police-involved deaths in Texas and Nevada.

Eventually, because of the frustrations and “governmental hiccups” resulting from document requests submitted to a large number of law-enforcement agencies, Fatal Encounters turned to Google Alerts. (Killed by Police also relies on daily searches of Google News to update its homicide list.)

Using Google Alerts to keep track of police shootings “makes sense provided there is human scrutiny,” said Krishna Bharat, founder of Google News (he is no longer working for Google). Search terms like “shooting” or “killed” or “murdered,” when paired with “police,” will return a big “basket of results,” to use Bharat’s terminology. This basket is then manually filtered to include only results of the police-involved events.

But because Google Alerts depend on news sites, mostly local, there are ways in which the system might run into problems. An obvious issue — technical in nature — is if Google’s algorithm fails to “crawl” a news report about a given shooting. Bharat is confident that Google is capable of finding any published news story: “Certainly in the United States, I think all publications are in.” (Media outlets apply to be a part of Google News, which currently includes about 70,000 from around the world.) Bharat is realistic, however, about the technical limitations. When Google isn’t crawling a news site, we could have “a vacuum for many months.”

A second possible problem is a human one. “Small-town reporting isn't what it used to be, and I would not be surprised if some cases just went unreported,” said Bharat. Burghart has seen omissions of this kind and recalled one incident that escaped the attention of the news media; it was only discovered when an article about a subsequent incident alluded to it.

A Closer Look at Accuracy

In March, before either the Guardian or the Washington Post launched their databases, researchers working with the Bureau of Justice Statistics published a study suggesting that the FBI was reporting only about half of the justifiable police-involved homicides from 2003-2009 and 2011.

The BJS study focused on their own program for tracking those killed during an arrest or who die while in police custody; they began collecting statistics from police departments in response to the Deaths in Custody Reporting Act, which was renewed in 2014. The BJS accounting differs from the FBI's in that it also includes, for example, people who commit suicide while in custody.

The BJS evaluated their numbers by comparing them to the FBI’s using a statistical technique known as “capture-recapture.” The idea is simple enough: assuming that the BJS and the FBI do their work independently, you can use the cases recorded by one agency (capture) and then see how many are identified by the other (recapture). A little statistical work yielded estimates of how well the two reporting systems performed and how many incidents were left out. The results aligned much more closely with the Guardian and Washington Post counts.

Interestingly, the BJS study notes that it began its own media-monitoring program for arrest-related deaths in January of 2010, specifically mentioning Google Alerts. In an email, Dr. Duren Banks, a criminologist with RTI International (a nonprofit research institute) and the lead author on the BJS study, said that they started using alerts and other media sources “to correct for the voluntary nature of the program — we had better reporting in some jurisdictions compared to others.”

Burghart has been conducting his own accuracy analysis, comparing Google with the police department incident records he requested from Nevada and Texas. He has nearly completed verifying the data from Houston, and of the 139 deaths identified in official police reports, 19 were not found by media searches. However, all but one of those occurred before 2008, very early in the life of Google News, which was officially launched in 2006. Since then, Google News has grown considerably and undergone technological changes that have impacted its archive. As a result, the farther back you search, the less likely you are to get thorough results.

The remaining unmatched incident in Houston was in 2011, and while Burghart is still examining the cause, he believes it is the result of a misspelling on the police report.

The Record

The best criminal justice reporting from around the web, organized by subject

The Benefits of Crowdsourcing

Fatal Encounters relies on crowdsourced information, and in a similar project last year with Deadspin, published a simple set of instructions for readers to follow. The advice given suggests the complexities of using the news as a basis for data.

Deadspin, for example, clarified that they “are not looking for incidence of police officers discharging their weapons and hitting no one” because they “are not as thoroughly reported and would probably bias the data.” That is, news outlets are less likely to report on them, and so these events are inconsistently entered into Google’s catalog.

There are many advantages to crowd-sourced data. The Guardian asks readers to submit a link to a news story and include “as many details as you know” as well as their own contact information. The Washington Post also asks for links to existing coverage, but includes prompts for location.

In general, participants are asked to extract some basic statistics from news reports about the incident, turning the so-called “unstructured” information of a news story into the “structured” categories. That information can then be compared across time, and between neighboring cities and states to identify trends that help address big questions like: Are shootings on the rise? Are certain races disproportionately affected?

The Government Steps Up

Days after the initial reports from the Washington Post and the Guardian, Senators Barbara Boxer and Cory Booker proposed the Police Reporting of Information, Data and Evidence Act, also known as the PRIDE Act, which would require law enforcement to report all incidents in which the use of force “by or against a civilian or a law enforcement officer results in serious bodily injury or death.”

For each incident, the act requests basic demographic data on the victim (civilian or police officer), whether the civilian had a weapon, the numbers of civilians and police involved in the incident, and a brief description of the event. In terms of implementation, the proposal offers grants to law-enforcement agencies to offset the cost of collecting statistics.

This bill also comes just after President Obama announced his Task Force on 21st Century Policing. The effort includes a number of well-known technology organizations like Code for America, a nonprofit promoting “civic tech.” The goal is to improve and then open data collection by police departments to help “build transparency and increase community trust.” Code for America recently launched its Police Open Data Census, providing advice to law-enforcement agencies about what kinds of data to release and how. They recommend making data freely available online, having it be machine readable (meaning that it can be shared and used in other applications), having it be available in real-time, and making incident-level data public.

These new initiatives underscore not just accountability, but also data sharing that helps develop policing practices and reduce the number of deaths, both among civilians and police.