Big Data need small data

Tom Breur

8 July 2016

Regardless of whether you consider Big Data “just” another hype or not, the unavoidable growth in data volumes is not about to slow down any time soon. It is coming in like a tsunami, unstoppable, swelling, a force no person can stop. If I am reading the tea leafs well, it also appears that our analytic capabilities are growing alongside: more and more data science professionals, ever more proficient, equipped with ever better technologies and tools.

What is Big Data, really?? For the time being, I’ll use a crude definition for Big Data where the volume in and of itself poses technical problems beyond the challenge of even extracting value from the data. Obviously, there is no commercially viable way to store all these data in your existing (traditional) data warehouse solution. Yet your data warehouse already showed its value, and is not going to be replaced, at least not just yet. All the reports and datasets that come out of it, I will refer to as “small data”, and they guide the organization in essential ways to holds its course.

One thing has become clear through this Big Data hype: the big data alone won’t get you very far. Big Data alone are “just” another cumbersome silo that require even more preparation than we as analysts are already used to. It is commonly estimated that about 90% of data analysts’ time is spent merely preparing the data, before any value is had form it. That is often even worse with the new-ish, semi-structured Big Data solutions.

If you want to extract value from Big Data, it absolutely has to be in conjunction with traditional, “small” data. It’s the context (!) of traditional reporting, key metrics, that imbue Big Data with meaning that is useful for pursuing corporate objectives. These business targets can be manifold, like cross-selling more to existing customers, driving down attrition, lowering costs of fulfillment, etc.

When you do sentiment analysis, for instance, your findings will be much more relevant if you can specify how emotions (as derived from social media data) are distributed across various pre-existing customer segments. Does your segment of “cost-control customers” have the same opinion as your segment of “big spenders”?

Let’s suppose you want to analyze smart meter (utility) data. It is often relevant when you can tie your findings to geographical areas that are in use to segregate the organization and customer base. Data are typically stratified in rural versus urban neighborhoods, white vs. blue collar neighborhoods, etc. These categories (strata) “live” in your traditional data warehouse. Analysis on usage of cellphones is much more relevant when related to existing segments to classify handset type (the latest iPhone 6?) that customers are using.

You get the idea. The old-fashioned “small data” provide context that business stakeholders are used to when they think about their business. Because without that link between Big Data and traditional sources, results are probably lacking essential (business) reference points.

 

4 comments

Leave a comment