28 September 2016
There is this notion of the rate of change ever increasing. I am not sure I buy into that, although it may often seem that way. What I do see happening, though, is that data consumers are becoming ever more emancipated and knowledgeable. I don’t see a time where business stakeholders will ever write MapReduce jobs, but reality is that BI tools are becoming ever more user-friendly. As data become more widely available, more commonly used, of course the user base will grow their capabilities.
The way traditional BI projects have been delivered was often not satisfactory. It always takes too long, and it costs too much. Yet there are lots and lots of data hungry information consumers who are dying to get their hands on more data. I believe this is where the notion of “self-service BI” originated. An interesting idea, although the classic problems that triggered the need for a “single version of the truth” (even though that turned out not to exist, see my post on Context is King) seem squarely at odds with decentralized people putting their own data sets together.
These compound pressures have resulted in ever growing pressure on BI to make more data available, and to do it sooner. Oh, and make sure the quality is impeccable, too. I’m skeptical of “data lakes” filling this gap. Although I have never seen a compelling description of its architectural foundation, it always sounded to me like the “old” notion of a persistent staging area. Except that our staging area was supposed not to be available to information consumers, because they might interpret data the wrong way. To quote Yogi Berra: “It’s déjà vu all over again.”
I recently commented that I buy into the notion of a staging area with evolutionary growth of a downstream (!) virtual data warehouse, sometimes also referred to as a “logical data warehouse.” As much as I support the notion of getting data into the hands of end-users as quick as possible, I dread scenarios with unfettered access to poorly modeled data (at least for analytical purposes), often with incomplete and cumbersome to use data dictionaries. That is just is bound to lead to more data chaos. Not good for anybody. Withholding data from end-users who want access is inexcusable, too, so I reckon it comes down to trust, again. Trust that you are indeed working with everybody’s best interest in mind, and doing the best you can to service as many people as possible, as quick as you can. No wonder we feel straddled!