The Limits of Public Discourse – Data Science W231 | Behind the Data: Humans and Values

By David Harding

In chapter 25 of The Hitchhikers Guide to Galaxy by Douglas Adams, two philosophers try to prevent the introduction of the super computer Deep Thought. They fear it will put them out of a job. As one explains “I mean, what’s the use of our sitting up half the night arguing that there may or may not be a God if this machine only goes and gives you his bleeding phone number the next morning?”

Written almost forty years ago, you might consider this passage a very prescient commentary on the dangers of artificial intelligence. At the time of writing, it simply reflected the absurdities of union demarcation in 1970’s Britain. The need for demarcation, especially between data and debate, seems quaint today. If we are questioning who scored the winning goal in the 2014 World Cup Final, or what region had the most sales in the last quarter at our company, we have that information at our fingertips; Wikipedia or the data analytics team supplies the answers. Data can and is injected effortlessly into discussion. Effortlessly, but not always sensibly.

This month, James Damore wrote about diversity at Google. He could have just mused philosophically on the dichotomy between enthusiasm for diversity in the workplace and reluctance to debate that diversity. Instead he felt compelled to quantify his argument, and worse, at least for his near-term employment prospects, included the phrase that is the statistical equivalent of a molotov cocktail: ‘on average’. He included it several times in describing various traits of men and women. The result is reminiscent of the enormous controversy surrounding the 1994 publication of ‘The Bell Curve’, comparing intelligence across racial lines. Another lesson in the power of data to drown discourse.

The sensible use of data poses one challenge, the confusion of data as facts another. Kellyanne Conway, perhaps inadvertently, neatly encapsulated the issue with her phrase ‘alternative facts’. If you know that any opinion you utter will be met with a barrage of data, then the natural reaction is to counter it with alternative data. The professional data scientist understands issues of collection bias, standard error and statistical significance. The man or woman on the street sees numbers that confirm or contradict their viewpoints. Confirmation is data as fact. Contradiction is data as fake. Recognize that humans can be easily fooled by false precision, even if it is unintentional.

Which leads to the third challenge. Data as distraction. Considering the relative merits of economic policy is difficult. Arguing about specific numbers much easier. The US President’s goal of putting coal miners back to work has led reporters to research employment figures. As the Washington Post reported, J.C. Penney employs more people than coal mining. So the focus shifts to a heated debate on the actual number of jobs that might be created, rather than industrial revitalization in general. If we can’t see the wood for the trees, it may be because the ever-increasing amount of data just means we have a lot more, very visible trees.

It is important to note that all the new tools to gather and analyze information allow us to make better, more timely decisions across a wide range of issues. But recognize the limits. Data that inflames or distracts does not advance public discourse. At best it is ignored, at worst it is distrusted. Without trust, we may wake one day to find a public that prefers No Data to Big Data.