11 December 2016
Data mining used to have a bad reputation. “Torture the data until they confess” was a credo that came from people outside of data mining, because those ‘in the know’ have always been eminently aware of overfitting of data and the accompanying risks. Yes, indeed, if you’re willing to twist reality far enough, you can (almost) always construe a data set to prove just about any point. So what? That merely shows that ignorant people are capable of doing stupid things. I knew that already… And I surely didn’t need any data mining or statistics for that.
At the moment, I see a similar trend happening with Data Science, and quite frankly I often struggle to clarify the differences with data mining. When I read a great book like “Data Science for Business” so much of its content reminds of what data mining always was about (FWIW, the authors are respected authorities in the data mining space). I’m still not entirely clear where to draw the boundaries, but that is beside the point I’d like to make.
When I read an article about “blind spots” that might result from using “Big Data”, it makes me wonder: what has Big Data got to do with blind spots?!? Critical thinking skills can be in short supply, yes, and the same even appears to hold for common sense, sometimes… As Jerry Weinberg once pointed out to me, “common sense” isn’t always so common J
One day, my son and I were riding our bicycles in the countryside in the Netherlands (Noord-Brabant). When we passed a tiny, tiny ‘village’ comprised of no more than a half dozen houses, I was joking to him: “Son, did you know that research has shown that the people in Vijf Eiken don’t need a train station, because no one has ever been spotted waiting for a train?” To which he replied: “But dad, that‘s because they don’t have a train station!” Me: “Son, that is brilliant!! One day you will grow up, and be a much better researcher than you father ever was!” Of course it is easy to lie with statistics, but heck, it’s a lot easier to lie without them!