The Data Lake Fallacy

Tom Breur

12 August 2016

Data driven is the nouveau du jour. Of course companies have always been data driven, but after some business management best-sellers, companies now appreciate more than ever that their digital assets leave a valuable footprint of customer behavior. That digital trace, often captured in log files, can be mined for commercial gain. Leveraging such data assets has proven instrumental for organizational performance and sustainable competitive advantage (see also my blog on “data based competitive advantage”).

Organizations that recognize their data are potentially valuable have begun storing them en masse. There may not be a clear business case, yet, but surely these data must hold some value in them somewhere, no? Against this background, the notion of “data lakes” has been promoted, a place where you store data cheaply, “just in case” you want to reuse them later for analytical purposes. It makes so much intuitive sense now that storage and computing power have gotten so much cheaper. But there are some problems down the road, and many data lakes don’t materialize into real bottom-line revenues. At time, the awakening has been rude, painful, costly, and disruptive. Replacing the responsible executive doesn’t solve the core governance problem – becoming data driven is a company-wide, cultural transition, and the technical challenges of storing and leveraging data are but a small part of the puzzle.

Several corporate powerhouses like Google, Amazon, Facebook, have built their fortune on Big Data. Analyst surveys invariably show that more and more companies are embracing this new technology, often starting off in some exploratory fashion. That’s when they stumble upon evasive “data issues”, and struggles to make business line decision makers embrace results they don’t understand: either because of black-box algorithms, or because the connection with “small data” reporting isn’t obvious. Talent is hard to come by, both for the hands-on programming skills, as well as ambassadors that can lead business partners to new meadows, based on radically new insights or transformative business model change. So we’re back to data governance…

As intuitive as the idea of a data lake may sound, it isn’t really an architectural concept. If there is no plan, no preconceived notion of how to turn this effort into business value, you are merely deferring the hard question of to turn data into dollars. Also, you stock your value pipeline at some cost (no matter how “cheap” big data storage may be), and you delay the timeline between investment and value creation. Astute business people immediately recognize that as a risk, making investments without clear pay-off, and without proper assessment of risks along the way. The last thing you want is your data lake to turn into a data landfill, as a result of pushing out the hard questions like how do we model the data, is it suitable for our business objectives, and last but not least: how is data going to transform the way we do business without customers?!?




  1. […] One of the recurring themes I hear when people talk about a data lake is that they prefer storing data in its native, original format, rather than having to transform it. This dramatically lower the cost to store, because you don’t have to do the upfront work of modeling and ETL. Also, this lowers the barrier when you are unsure about the business case for storage. You can store far more data “just in case” you need it later, and only incur storage cost, rather than storage + ETL – the latter can be a considerable expense. This, I feel, is a genuine benefit, and I pointed to it here. […]


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s