Crime, punishment and the big data revolution — For years, data analysts and researchers have relied on causality to explain why variables relate to each other in a particular way. But in the world of big data, correlation is king. Viktor Mayer-Schonberger and Kenneth Cukier explain why in their book, Big Data: A Revolution That Will Transform How We Live, Work, and Think.

In the realm of commerce, companies use data to predict consumer behavior and tailor products and services to that end. Big Data: A Revolution provides helpful insight into why the learning curve for enterprise is so great: Maximizing the story correlation tells us is a decidedly new way of thinking about data.

Big Data Crime

Image credits: Vintage Printable

“In a big-data world…we won’t have to be fixated on causality,” write Schonberger and Cukier. “Instead, we can discover patterns and correlations in the data that offer us novel and invaluable insights. The correlations may not tell us precisely why something is happening, but they alert us that it is happening.”

There are times, the book argues, when correlation is “good enough.” If, for example, data reveals that cancer patients who regularly consume aspirin and orange juice see their disease go into remission, why seek a causal explanation before recommending the regimen? In that case, correlation could save lives, so causation is less important. But what if authorities could predict where and when a crime will take place, and who – down to the individual – is likely to commit it?

In fact, authorities can already predict crime based on data analyses. Parole boards in more than half of all U.S. states use such information in deciding whether or not to release the incarcerated from prison. And, according to Big Data: A Revolution, more precincts are beginning to employ “predictive policing,” or “using big-data analysis to select what streets, groups, and individuals to subject to extra scrutiny, simply because an algorithm pointed to them as more likely to commit crime.”

The potential problems with such an approach are alarming. The whole notion sounds ridiculous. Repulsive, even. The sort of Orwellian future we’ve read about or seen depicted in films, but not something that could actually occur in a democratic society. Schonberger and Cukier argue that there are serious ethical and moral downsides to incorporating big data findings into judicial policy. The main reason? Big data only reveals correlation – not causation.

“Big data does not tell us anything about causality,” say Schonberger and Cukier. “In contrast, assigning ‘guilt’ – individual culpability – requires that people we judge have chosen a particular action. Their decision must have been causal for the action that followed. Precisely because big data is based on correlations, it is an utterly unsuitable tool to help us judge causality and thus assign individual culpability.”

Big data is as unsuitable, perhaps, as the racial profiling that is now standard practice in airports across the country. Such preventative methods for reducing crime are nothing new. But the same subjective dilemmas that plague those policies will only be exacerbated by big data, since numbers, however faulty or misleadingly presented, tend to assume an air of objective authority.

“With so much seemingly objective data available,” the authors note, “it may seem appealing to de-emotionalize and de-individualize decision-making, to rely on algorithms rather than on subjective assessments by judges and evaluators, and to frame decisions not in the language of personal responsibility but in terms of more ‘objective’ risks and their avoidance.”

De-emotionalizing inherently charged, social issues could worsen the effect of disproportionate arrests based on demographic probabilities. A recent ACLU study, for example, revealed that black Americans were 3.73 times more likely to be arrested for marijuana possession than white Americans in 2010. Actual use and possession, mind you, is about equal between white and black Americans. Imagine if big data were to enter the picture. Analyses might show a skewed picture of which neighborhoods have a higher probability of crime. Authorities could then decide, based on correlation alone, that those communities should be policed more heavily.

It’s worth considering such hypothetical situations. By disregarding the sociological implications – or the cause of higher crime rates in certain areas – we could end up condemning lower-income, less educated citizens to a life of fear, repression and data-based targeting. The hypotheticals, or undiscovered correlations, are precisely what make big data so potentially powerful.

So how can authorities wield the positive power of big data and still protect the rights of individual citizens, without obstructing social progress?

“A fundamental pillar of big-data governance must be a guarantee that we will continue to judge people by considering their personal responsibility and their actual behavior,” write Schonberger and Cukier, “not by ‘objectively’ crunching data to determine whether they’re likely wrongdoers. Only that way will we treat them as human beings: as people who have the freedom to choose their actions and the right to be judged by them.”

It will take the dedicated effort of policy makers and individual citizens to ensure that our right to be treated as human beings – not functions of a data set – is upheld in U.S. courts and precincts. This cautionary tale is a helpful thought exercise for corporate data analysts. The key to unlocking big data’s potential lies in seeking a deeper understanding of what correlation suggests, then acting upon that information in thoughtful ways.

Madison Andrews
Madison Andrews is a writer, editor, and designer living in Austin, Texas. She is founder and editor of
Madison Andrews
Madison Andrews
Tags: Business,Data Center