31 March 2018
For years, I used to joke that BI professionals mostly leave testing to end users. Data needs keep growing, volumes are exploding, demands for lower latency access increase, and therefore the available window to actually do any loading (if data gets moved, physically) keeps shrinking. As a result, many or most BI teams ‘just’ load their data as is. “We’ll let our end-users test it”, I would always say, sarcastically…
Several years ago, I came across a Kent Beck keynote on YouTube: “Software G Forces: The Effects of Acceleration” and despite it being 1,5 hours, I watched it several times. Processes and testing fundamentally go through step changes as the pace of delivery increases. Agile and Lean have certainly contributed to progress, if only because we think in weeks instead of months. But the next wave of continuous integration and continuous delivery (very different things!) is already rolling in. And so it should. I have to give the DevOps movement credit for that.
If we think about “progress” and “acceleration” like Kent Beck, and apply that thinking to DataOps, I think we need to move beyond the “old school” testing of data streams as a time series that ought to be monitored (like I wrote about in 2009). Why? Because that approach is still backwards: you try to isolate inconsistencies after you have gotten stung.
A more forward going approach, I believe, comes from the “Specification by Example” movement. Unfortunately, that thinking has rarely if ever reached the Business Intelligence crowds… A shining exception is my friend and thought leader Stephan Deblois who has been championing this approach for many years already (and several other radical innovations).
Whether you call it Test Driven Development, Specification by Example, or you feel the need to introduce yet another buzzword: unless your customers’ needs (how they expect their data to behave) are built into your system as tests (or verifications as exploratory testers might call them) that prevent regression, and automate all future testing, your pace will slow down as your BI system grows and evolves. And if it doesn’t slow you down, yet, system complexity will certainly prevent you from going faster.
BI systems are living systems: fulfilling information needs invariably triggers new information needs (see this paper, for instance). If you haven’t seen that happening, yet, probably you haven’t been in the business long enough. For those reasons I have come to conclude that encapsulating expectations about incoming data as part of your build, seems for me the only way to ensure development can continue forever at a sustainable pace (note the 8th principle). If that seems like a burden or overhead, as I like to say: you have to go slow, to go fast.