7 September 2016
Agile has been around for about two decades now, but BI professionals were slow to embrace it. I have always felt we lagged about 10 years behind mainstream development. In the early stages of “Agile BI”, this was construed as a superior project management approach, largely top-down management in shorter cycles. Kind of like a turbo-charged Waterfall approach.
Then awareness grew that in order to work in a more Agile fashion, you should consider what and how you build, rather than looking at the project management process alone. This was when appreciation for data modeling grew that could truly support incremental development. This is unlike, for instance, the reigning Kimball bus matrix architecture which requires the entire model to be built before you start loading, or else you run the additional cost of reloading and reindexing your fact tables every time you want to add another attribute in a dimension somewhere. I wrote about that before, together with my esteemed colleague Badarinath Boyina.
Besides emphasis on an architecture that is more suitable to incremental delivery (which obviously supports early delivery better…), we started to think about the way teams need to work together, software craftsmanship and the value of better maintainable code. In particular BI is prone to changing requirements simply because the act of delivering some information is likely to trigger more and new questions, and hence an update to the requirements.
Then we started moving to the cloud, embracing data virtualization, and all the while the cost of storage and processing kept dropping and dropping. This dynamic changed many constraints. For one thing, the risk of not keeping data becomes more prominent relative to the cost of storing data that you wind up never (or hardly ever) using. Think “data lakes”, although I strongly feel that term has become tainted due to marketing and sales hype. Still, as storage become cheaper, it does make sense to be more liberal with keeping data “just in case.”
I feel strongly that the new serverless paradigm will give another boost to truly Agile BI architecture. We now have the option of wholesale denormalizing of datasets which enabled superior performance in Star Schemas all along, and redeploying (expanding, restructuring) these datasets as the model changes. The AWS Lambda architecture is (still) fairly new in BI, time will tell whether and when we begin to embrace it.
One reason why I feel the serverless paradigm is a boon for BI is because it “automatically” pulls BI development teams towards software craftsmanship and a kind of working that is closely aligned with DevOps. This makes it even more attractive to drive down the cost and timelines of deploying. This “indirectly” contributes to agility and continuous integration and test driven development, the new frontiers for Agile BI professionals.
Traditional, IT-owned data warehouse legacy has not put BI in a good light, I would say. The call for “self-service BI”, unfortunately, is often driven by unacceptable delivery timelines for mainstream (corporate sanctioned) BI. However, unfettered, poorly governed distribution of datasets, imho, will lead to chaos and set us back. But to combat the tremendous information hunger that the Big Data era has triggered, we, as BI professionals need to do better. Work, “be” more Agile and responsive, and lead the organization towards a more data driven future. For me, personally, that almost always means some flavor of Agile, empowered and self-organizing teams, equipped with the right tools and technology. The future is bright!