15 July 2016
Yogi Berra was clearly ahead of his time, the current Big Data era, when he said: “Predicting is hard, especially when it’s about the future.” Of course the late Yogi Berra is an amazing Jack-of-all-trades: catcher, manager, coach, and contemporary philosopher.
Some of his classics: “Always attend other people’s funerals, or they won’t come to yours”, “when you come to a fork in the road, take it”, “You can observe a lot just by watching”, “Baseball is 90% mental – the other half is physical”, “Nobody goes there anymore; it’s too crowded.” But he was clearly also an expert on predictive modeling when he said: “Predicting is hard, especially when it’s about the future.”
Whenever you build a predictive model, you are in search for consistent patterns in the data. For instance, which patterns in customer behavior like purchases, inquiries, clicks might be indicative of response to a particular campaign. Then on the basis of those patterns, you will search for lookalikes in the future to increase campaign response. When you look at it like that, “predictive” models really don’t have anything to do with predicting per se. What you are doing is classifying new records (customers) in the past. Mostly this classification uses a continuous output variable, and then we talk about scoring. We choose a cut-off score for inclusion in the campaign, and that way we turn a score (continuous) into classes (binary).
We make these kinds of “predictions” on the assumption –and that is a big assumption– that the future will look like the past. Without that assumption, there is absolutely no basis for classification. The problem, of course, is that we all know very well that the future isn’t like the past, something Yogi Berra noted a long time ago.