4 November 2018
With all the hype surrounding “Big Data”, I feel like taking stock from time to time. What can data do for us, and what not? As William Kent wrote in 1978 already in Data and Reality, data is but an abstraction of reality. At best data is a pale reflection of what actually happened out there. Often stale, because as soon as you record a data element, it has already become an observation of the past.
“Data-driven” has become a holy grail, to the point that some might forget that you always and constantly need to triangulate values in your database with events in the real world. Brilliantly depicted in the movie “The Big Short” (2015), the main character Michael Burry (played by Christian Bale) realized that financial markets were relying on data that did not bode with the actual number of foreclosures. Burry also discovered that many real estate development projects weren’t nearly as appealing as their brochures suggested. And even after they gathered first hand evidence by visiting some of these developments, and talking to the people who lived there, this new information was exceedingly hard to get across. Nobody wanted to believe them. When this bizarre disconnect became apparent to him, Burry began shorting the real estate and mortgage market. In the process he almost blew up his own firm, before the market “finally” collapsed and his bets (so-called “shorting” the market) began paying off in spades.
There are a lot of lessons you can draw from this movie. It underscores how general managers suffer from innumeracy (see also a previous bog post), roughly proportionate to the incidence in the general population. Even managers of quant divisions can be remarkably susceptible to financial illusions. Especially when new information puts their existing investment positions at grave risk. We’re all familiar with the sagas of Barings Bank (Nick Leeson), Société Générale (Jérôme Kerviel), UBS (Kweku Adoboli), and various others. If you (still) buy into the notion these were “merely” fraudulent and greedy rogue traders, you’d probably do well to take Governance 101 again…
Authors like Tversky & Kahneman and Taleb (Fooled by Randomness) have written about the psychological dynamics that perpetuate innumeracy. Some leading authorities are still concerned about the risks that linger after the sudden collapse of Lehman Brothers, the largest bankruptcy filing in the history of the US. Bear Stearns comes to mind, too. In our contemporary financial markets, increasing reliance on “data” that are often poorly understood and infrequently triangulated (like Burry and his pals did in the Movie) poses systemic risks. I would argue these risks are also (very) poorly understood. When a “new” observation flies in the face of what your data have been saying, it can be exceedingly hard to embrace it and change perspectives.
Data are never objective. Not ever. A person or system collected those data with a purpose. And it is exactly that purpose that introduces unavoidable bias. Some observations may be eliminated because they don’t meet the purpose of data collection. Other signals may be censored or attenuated. Those decisions that constitute collection bias are a systemic element of the application being used for recording of data. So not only are data subjective, the mechanisms for recording perpetuate bias, too.
Our contemporary philosophy of science doesn’t postulate any “objective reality.” What we refer to as “objective” is a shared view, consensus between a group of people about what they claim to see. Note that this refers to a perception, with no claim whatsoever as to what is actually out there in the real world. Also note that perforce a cultural and societal bias is embedded in that shared view. Different people can have different perspectives, and have their own distinct reality. When there is sufficient consensus, we promote that shared view to “reality.” Obviously as we gather new evidence, what we refer to as reality evolves.
All of this is testimony to why it is so important to clarify the dynamics of data science models, as I have written about before. Data processing, the transformation required for analytical purposes adds meaning to it, too. A famous anecdote is how the NASA for many years gathered data about sun radiation, and habitually discarded outliers at the South Pole. Later they discovered their measurements were actually valid, which led to the discovery of the hole in the ozone layer. I would encourage you to step back from your work, occasionally, and pause for thought how unconscious biases might be affecting some of your recommendations…