17 April 2018
There is something seductive about magic. You get whatever you desire, at zero cost. No effort. As if by divine intervention. Interestingly, I sometimes get the impression that all the hype around “Big Data” seems to have nurtured that notion, too. Data science involves taking various types of data as input, then juggle them through an analytical algorithm (a magic formula), and out comes an astounding result! I associate that with alchemy, where masters of a dark past managed to turn lead into gold. When I typed “alchemist” into Google, I got an uncanny result:
It’s the second part of this definition that some people appear to equate with data science. This worries me because there is nothing “magical” about data science at all. In fact, I would argue the opposite. Lack of understanding how data scientists achieve their results gets in the way of broader acceptance and adoption. And personally, I can relate to that: I’d hate to put my fate in magic, too! As Arthur C Clarke has famously said: “Any sufficiently advanced technology is indistinguishable from magic.“
The most effective applications of data science move beyond the model, to an embedded application. In another post I referred to those as Artificial Intelligence (AI). When machine learning applications are seamlessly integrated in primary work processes, it becomes even more important to understand the dynamics of data science models – how else can you govern?
Most, if not all the complaints I hear about misuse of data science technology seem to stem from unawareness of what models can and cannot do for you. I would argue the financial meltdown in 2008 (nicely documented in the movie “The Big Short”) was caused by this: financial market relied on the accuracy of models behind CDO’s (collateralized debt obligations), and when those models were proven wrong our global economy blew up!
Many authors have pointed to the risks of “blind” reliance on models, like Cathy O’Neill’s “Weapons of Math Destruction” (2016). As we move into an era of autonomous vehicles, recent incidents like in Arizona with Uber’s experimental car remind us why everybody needs some understanding of what the consequences of relying on AI (positive as well as negative) might be. Algorithms are all around us already, and will continue to play an ever growing role in our lives.
Data scientists have a role to play in educating the general population on possibilities and limitations of our profession. Data science and ethics has to be an integral part of our educational curriculum. As scandals with Cambridge Analytica and Facebook have made clear, we can no longer roll back time. And “simply” disabling your Facebook account won’t even escape threats of implicit surveillance. Privacy matters. A lot. But safeguarding it is anything but obvious.
If your data science efforts seem like “Alchemy”, something is going terribly wrong: I have never seen responsible governance for magic. The global economy has suffered some costly blows when we allowed ourselves to be lulled into believing maybe magic might just be possible. Our privacy and safety are at stake, and even the stability of democratic systems.
Data science applications have proven their commercial value, and the widespread application means no one has a choice to avoid being influenced by this technology – for good, or for bad. Not even by hiding under a rock. Public awareness needs to grow and evolve, nurtured by genuine data science emancipation, similar to Europe’s privacy evolution like GDPR and their insistence on the “right to be forgotten.”
The evolution will not happen overnight. General awareness takes time to grow. In fact, it might take as much as an entire generation to trickle through society. I’m reminded of “tile wisdom” at home: “the impossible we deliver promptly, miracles take a little longer.”