Big data: the thin data revolution

Big data could be the biggest game-changing opportunity since the internet. Sarah Hetherington and Christian Madsbjerg examine how leaders need to use data to piece together a richly textured view of the world. 

The future of big data is widely described as, well, big – $50 billion market big; $200 billion marketing opportunity big; 1.5 million data jobs big. The management consultancy McKinsey calls it “the biggest game-changing opportunity for marketing and sales since the internet went mainstream”. MIT’s Andrew McAfee (2012) refers to it as “the next big chapter of our business history”.

But once the hype is stripped away, big data is something much more straightforward: streams of abstracted data disconnected from their context in the world. Considering this, relying entirely on big data can result in misguided business decisions based on abstract numbers. What big data lacks is one simple thing: a connection to our subjective experience or the way that we, as people, actually understand the world. Without this – the subjective frame of individual and cultural experience – the big data approach can only deliver “thin data”, or numbers stripped of any richer contextual meaning. If business leaders really want to understand the complexity of the world, they need to pair their use of big data with “thick data” or data that richly captures the human experience.

Swimming in a sea of big data

In 2003, Craig Venter, the scientist famous for decoding the human genome, set out to sample the genetic diversity in the oceans. By filtering seawater and sequencing the genomes of the undifferentiated gunk in the Sargasso Sea – techniques of big data – Venter discovered more than 1,800 new species of bacteria in a couple of weeks, the largest batch of new species in history. He never actually looked at a single one of them. In this way, what he really discovered were 1,800 statistical blips, 1,800 correlations that delineated new species. His big data techniques tell us nothing more of the bacteria except that they exist. Big data is analysis without outcome and data without analysis – specifically, the kind of analysis required to produce results and innovation. Thus, simply capturing big data is not tantamount to finding business solutions or innovation. If Venter had simply discovered the existence of 1,800 species without analyzing them and thinking about how to put them to use, his discovery would have been interesting, but not impactful on the level of introducing change and producing new opportunities. Venter converted his 2003 expedition into a $600 million collaboration with ExxonMobil to develop next-generation biofuels. The big data approach can lead to such transformational business opportunities and it can also add tremendous value to CRM solutions, say, or to supply chain optimization. But what is big data actually telling us about our customers and operations? Just as Venter never actually gained any insight into the bacteria he discovered, what insights are we really gaining into the wants and needs of people when we rely on big data solutions?

Venter was able to take his discovery and turn it into a collaboration with Exxon, taking a discovery of a great quantity of new “data” and turning it into a successful enterprise. He is a positive example of how to harness (a kind of) big data through analysis, which is precisely the orientation toward big data that we want suggest in the article. Applying the Venter analogy to your own business, what are you going to do with the discovery of 1,800 new sets of data – complications, opportunities or simply business black holes – that may or may not have meaning? How can big data techniques help you to solve the problems you face without raising all sorts of new ones? By looking at how scientists in the past tried to describe their world, we can gain a better understanding of what big data can and cannot do.

 A better understanding

The optimism around big data seems well founded: it has been spectacularly successful for Google, Amazon, Netflix and others. It is an incredibly powerful tool for understanding what a business’ customers are doing – when wielded correctly by its users, it can revolutionize the decisions we make in government, healthcare, finance and elsewhere. But we cannot let the strengths of big data seduce us into thinking that it is the only tool available for understanding our customers, a common assumption among some of big data’s greatest evangelists. They base their arguments on a now-legendary 2008 Wired magazine article entitled The End of Theory by Chris Anderson. The core idea of this article proposes that if you can analyze a system in enough detail, you can help it evolve without knowing how it works.

According to the article’s argument, the way we explained systems in the past – through models and hypotheses – is becoming increasingly irrelevant, crude approximations of the truth. In 2008, the internet, smartphones and CRM software were already delivering a superabundance of data. “The numbers speak for themselves,” Anderson writes as he quotes business leaders like Peter Norvig, director of research at Google. “All models are wrong, and increasingly you can succeed without them.” Ultimately, Anderson takes Norvig’s ideas and runs with them:

This is a world where massive amounts of data and applied mathematics replace every other tool that might be brought to bear. Out with every theory of human behaviour, from linguistics to sociology. Forget taxonomy, ontology, and psychology. Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves.

It sounds revolutionary and progressive to throw every theory of human behaviour on the dust heap, and replace them with tracking and measuring. But, as centuries of philosophers such as Bacon, James and Heidegger have shown, this is an old, flawed argument in an even older debate.

 A long scientific discussion

The desire to fully capture the complexity of the world is, by no means, a new phenomenon. Although big data is the latest manifestation of this desire, 17th century English lawyer and scientist Francis Bacon had this exact same passion for accuracy. In a growing frustration with the “abstract and useless generalities” of mathematics to reveal the world, he developed his Baconian method, or what we now call the “scientific method”. To describe the world through the capture of evidence, he wanted to build up knowledge in a series of gradual steps, based on observation through the senses: facts that speak for themselves. We cannot be certain of anything, but if we see a white swan, then another, then another, our observations will tell us that all swans are white. Some call this approach productively naïve because it looks at the data with no preconceptions, without context. Big data extends Bacon’s argument to the extreme. Not only does it say we should be naïve about the data, it goes on to argue that we should dispense with the need for interpretation altogether. More data means a more elaborated picture of the world for leaders, or so goes the argument.

Over the following centuries, other philosophers dismantled Bacon’s position, offering advice contemporary leaders should take to heart. Nineteenth-century pragmatist William James, for one, critiqued the mere possibility of a naïve approach to data. James wrote: “No one ever had a simple sensation by itself. Consciousness… is of a teeming multiplicity of objects and relations.” A white swan looks red in a red light; to understand the colour of swans, we also have to understand the properties of light. Facts always live in a context, and hacking them into discrete data points renders them meaningless and incomplete. Later, Martin Heidegger and the phenomenological tradition built on James’ statement, arguing for a view of experience – not to mention people – as inextricable worlds, in which we cannot separate mind from body, person from environment. The phenomenologists did not aim to dismantle the scientific method as a tool for understanding physics or science – rather they claimed that Bacon’s method simply falls short in terms of making sense of people. Our embedded experience in a world presents great challenges for “thin data” or abstracted data sets: for example, if data about human experience changes as our worlds change, how can we harness that data to paint a picture of who we are?

An interdependent world

Our enthusiasm for big data threatens to obscure what James and Heidegger taught us long ago: that the world is not a sum of discrete facts, and is instead always characterized by its interconnectedness. James’ statement is a repudiation of a naïve view of data (the dominant view of big data) and it forms an argument for having a lens or frame through which to make sense of its “teeming multiplicity”. This phrase is prescient of the current day situation in which the world is a true global ecology: increasingly complex, interconnected and increasingly difficult to organize as an assembly of facts.

Big data has arrived into a business environment in which understanding context is more important than ever. In Duke Corporate Education’s study of CEOs and their perspective on the contemporary business environment, CEOs acknowledged two things: the interdependent nature of the world, and the necessity for leaders to understand new and unfamiliar contexts – political, technological, cultural – to make sense of that interdependence.

The philosophical argument between Bacon (and his close intellectual cousin, mind-body dualist Rene Descartes) and the likes of James and Heidegger serves as a parable for modern leaders, instructing us how to act on big data. It informs us how to harness big data in a productive way, rather than misguidedly falling prey to its seductive power.

Navigating in a sea of big data

So, what does it mean to lead in the era of big data? To begin with, it entails taking a more active role in the interpretation of the vast sea of data at your disposal. Rather than letting the objective data guide you, it is the role of the leader to make sense of and take a stance on the data – to decide how you want to interpret the facts you are given and develop a strategy accordingly. Using big data well means treating it as a source of information in need of someone to wield it intelligently and interpretively. This should happen in two main areas:

Collection of big data: first, navigating big data well means taking the step of selecting a context in which to collect data. The mere task of “collecting data” is meaningless in the abstract. What data do we collect? What for? How? It is impossible to study the world without some sort of paradigm for thinking about what you want to study. If you are building a big data offering for your business, you need to start by thinking about what paradigm you are operating in – like the scientific and humanistic methods show us, you need to have a hypothesis or a field of inquiry, data itself is not enough. Does your business invest in Hulu or Amazon Video ads, using their viewing metrics to get the most valuable impressions? How do you know that the person watching the show is the same person that viewed the previous shows?

Interpretation of big data: the second way to navigate big data well is to have a perspective on how data fits together as an expressive portrait. Leaders must find people who can help them use data to piece together a richly textured view of the world, in which resulting interpretations can add up to something greater than the data collected. Department store Target identified a teenage girl’s pregnancy through her purchases – then sent her some coupons for baby products, causing her father to storm into the store. The algorithms correctly diagnosed her objective needs (baby products for a mother-to-be), but knew nothing of her subjective reality (family life, perceptions and social context). It misread how she lives in her world, causing a dramatic scene. Big data’s ability to capture huge quantities of data might seem to be synonymous with painting a complex picture of the world, but without the ability to interpret that complexity, the data risks becoming more noise or offensive to consumers leading the organization astray.

Understanding humanities and science

Big data offers a lot: it can measure that which is above the threshold of awareness; it measures quantities and iterations; it measures choices. But it cannot offer a better grasp on the interdependent world and how to navigate the resulting cultural landscape. Because of its weakness on delivering what global leaders need, big data requires the complement of an understanding of human behaviour, something the human sciences are able to grant us. The social sciences and the humanities offer a body of theory and methods for capturing all that lies below the level of awareness, specifically the study of human behaviour through ethnography.

Many successful organizations, particularly from the world of technology, are already aware of what the social sciences can do for them. Xerox PARC, Intel and IBM have long hired enormous numbers of trained social scientists, particularly anthropologists, to study human behaviour and to let that behaviour inform their business strategy and product development. Facebook, despite the vast quantities of data on user behaviour at their disposal, has recently undertaken several large qualitative research projects – both ethnographic projects in foreign countries as well as forming a panel of users with which to run qualitative surveys. Quantitative analysis will never be a sufficient stand-in for truly observing and then reflecting upon the behaviour of users and consumers.

Science and humanities

CP Snow lamented the gulf between the sciences and the humanities in his lecture “The Two Cultures” and Stephen Pinker (2013) reminds us that science and the humanities need not be enemies. Our era offers big data and qualitative methods for understanding human behaviour and, together, they offer potential for leaders to avail themselves of intensely rich portraits of the human condition. Some of the 1.5 million data workers entering the workforce should be trained to interpret the data, not just sort it.

Learning from the historical dialogue among philosophers provides practical direction for contemporary leaders. That dialogue makes clear the necessity of a contextualized view of data and seeing big data as one useful tool among a palette of tools and methods. It shows that understanding the human condition through thick data is necessary to wring the most use out of the thin data that big data provides. It dismantles the aura surrounding big data right now: the sense that these abstracted data streams can deliver silver bullet insights without any interpretative lens from the human sciences. It is time to put data back where it belongs: the world.

Sarah Hetherington is a consultant and Christian Madsbjerg is a senior partner at ReD Associates.

Further Reading

The Age of Big Data, New York Times, Steve Lohr (2012)

Science is not your enemy: An impassioned plea to neglected novelists, embattled professors and tenure-less historians, New Republic, Steven Pinker, (2013)

The End of Theory: Data Deluge Makes the Scientific Method Obsolete, Wired, Chris Anderson (2008)