Trust in Big Data

Tom Breur

23 July 2017

Where I come from (the Netherlands), we have an expression that translates into something like: “Trust comes on foot, but leaves on horseback.” In Jerry (Gerald M.) Weinberg’s brilliant “Secrets of Consulting” he writes: “Trust takes years to win, moments to lose.”

Applications of Big Data are growing, and investments in AI applications seem to be sky rocketing. Occasionally, “accidents” (bloopers) with Big Data applications will emerge. That doesn’t mean there isn’t value in Big Data – to the contrary. I would argue it shows that the cost of misclassification can have both financial as well as PR implications. About five years ago, Target had a PR glitch with their “pregnancy algorithm”, although it isn’t even sure whether this actually happened. What that shows to me, foremost, is that consumer sensitivity around Big Data usage is real, and should be a cause for concern.

Interestingly, one company’s problem, is another one’s opportunity. Topics like trust, credibility, safety, brand image, etc., are topics of Machine Learning research themselves, and get packaged as service offerings, as “products” that are offered to companies who spend considerable money on online advertising. Last year, digital ad spending surpassed TV ad spending, at a whopping $72 Billion. Clearly there is a market there. Household brands with tremendous brand equity, are spending a lot of money online and cannot afford to have their reputation compromised by fraudulent and rogue schemes that display inappropriate content to their customers.

One of the things we learned from the 2016 US presidential elections and its aftermath, is that detecting and identifying “fake news” is genuinely hard. This problem is compounded by the fact that internet giants need traffic, so everyone is vying for eyeballs. And just like Willie Sutton, internet fraudsters go where the money is… In part, I believe the general public needs to be better informed, “educated” if you like, about the opportunities and risks of using Big Data.

There are a few problems that are commonly associated with models that capture “exception behavior”, and by that I mean datasets that are heavily skewed toward the minority class. For example, detecting “fraud” would be a whole lot easier if it happened more often! The fact that your training data have preciously few examples of your target class, makes them harder to train, monitor and update. As a result, population drift will occur more rapidly, relatively to the ability to update and improve your model.

In problem domains with a scarce minority class, Deep Learning approaches tend to do well, experience has shown that they tend to outperform alternative algorithms – sometimes by a considerable margin. Unfortunately, not every domain “can” use a black box model like a Neural Network. Regulatory reasons, or compliance, may preclude these approaches.

And last, but certainly not least, in many instances where “trust” is at stake, it gets compromised by criminal parties with malicious intent. I compare this with the war on performance enhancing drugs. As soon as we find a test to prevent one substance from going undetected, already fraudsters will be trying their hand with new products. By the same token, as soon as a new fraud scheme gets detected (consistently), the crooks will be thinking up something new. So “population drift” is an inherent function of the market dynamics, rather than “organic” change in the world out there. This must be the most sustainable “job security” for Machine Learning professionals 🙂



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s