16 days ago

Pitfalls of Data-Driven

Link: http://rvsoapbox.blogspot.com/2022/09/pitfalls-of-data-driven.html

@Jon_Ayre questions whether an organization’s being data-driven drives the right behaviours. He identifies a number of pitfalls.

  • It’s all too easy to interpret data through a biased viewpoint
  • Data is used to justify a decision that has already been made
  • Data only tells you what happens in the existing environment, so may have limited value in predicting the consequences of making changes to this environment

In a comment below Jon’s post, Matt Ballentine suggests that this is about evidence-based decision making, and notes the prevalence of confirmation bias. Which can generate a couple of additional pitfalls.

  • Data is used selectively – data that supports one’s position is emphasized, while conflicting data is ignored.
  • Data is collected specifically to provide evidence for the chosen position – thus resulting in policy-based evidence instead of evidence-based policy.

A related pitfall is availability bias – using data that is easily available, or satisfies some quality threshold, and overlooking the possibility that other data (so-called dark data) might reveal a different pattern. In science and medicine, this can take the form of publication bias. In the commercial world, this might mean analysing successful sales and ignoring interrupted or abandoned transactions.

It’s not difficult to find examples of these pitfalls, both in the corporate world and in public affairs. See my analysis of Mrs May’s Immigration Targets. See also Jonathan Wilson’s piece on the limits of a data-driven approach in football, in which he notes low sample size, the selective nature of the data, and an absence of nuance.

One of the false assumptions that leads to these pitfalls is the idea that the data speaks for itself. (This idea was asserted by the editor of Wired Magazine in 2008, and has been widely criticized since. See my post Big Data and Organizational Intelligence.) In which case, being data driven simply means following the data.

During the COVID pandemic, there was much talk about following the data, or perhaps following the science. But given that there was often disagreement about which data, or which science, some people adopted an ultra-sceptical position, reluctant to accept any data or any science. Or they felt empowered to do their own research. (Francesca Tripodi sees parallels between the idea that one should research a topic oneself rather than relying on experts, and the Protestant ethic of bible study and scriptural inference. See my post Thinking with the majority – a new twist.)

But I don’t think being data-driven entails simply blindly following some data. There should be space for critical evaluation and sense-making, questioning the strength and relevance of the data, open to alternative interpretations of the data, and always hungry for new sources of data that might provide new insight or a different perspective. Experiments, tests.

Jon talks about Amazon running experiments instead of relying on historical data alone. And in my post Rhyme or Reason I talked about the key importance of A/B testing at Netflix. If Amazon and Netflix don’t count as data-driven organizations, I don’t know what does.

So Matt asks if we should be talking about “experiment-driven” instead. I agree that experiment is important and useful, but I wouldn’t put it in the driving seat. I think we need multiple tools for situation awareness (making sense of what is going on and where it might be going) and action judgement (thinking through the available action paths), and experimentation is just one of these tools.


Jonathan Wilson, Football tacticians bowled over by quick-fix data risk being knocked for six (Guardian, 17 September 2022)

Related posts: From Dodgy Data to Dodgy Policy – Mrs May’s Immigration Targets (March 2017), Rhyme or Reason (June 2017). Big Data and Organizational Intelligence (November 2018), Dark Data (February 2020), Business Science and its Enemies (November 2020), Thinking with the majority – a new twist (May 2021), Data-Driven Reasoning (COVID) (April 2022)

My new book on Data Strategy now available on LeanPub: How To Do Things With Data.