31 August 2016
42. Should be clear, no? Well, I guess that depends on your context… If you’re a doctor, your patient has a critically high fever. Maybe you’re a marathon runner, and you know there is only 195 meters left to the finish line. Or maybe you have read the Hitchhiker’s Guide to the Galaxy, too, and you consider 42 the answer to the ultimate question of life, the universe, and everything.
When we gather and integrate data through formalized corporate reporting, these data are sanctioned. This is mostly a silent, implicit process. One that most people are never aware of. Data only gain meaning in the context of a unified definition, an agreed upon set of rules we use to aggregate and transform raw source data into tabular or visual form. These definitions, sometimes called business rules, determine which records are to be excluded, how they are to be grouped, and what the rows and columns in reports are comprised of.
When people were chasing a “single version of the truth” in the 90’s, the assumption was that there was one “truth”, and if we tried long and hard enough, we’d get there. Nowadays, we know better. There is no such thing as a single version of the truth. Best case, there may exist some centralized repository, accessible for all, that contains a single version of the facts. Not at all the same thing.
I can understand when most non-technical data users do not want to spend the time to familiarize themselves with the nitty gritty details of ETL: Extract, Transform, Load processes that determine which data will be sanctioned in the corporate data warehouse. Like it or not, the enterprise data warehouse holds the data that the company has agreed to be “true.” It would be nice if data governance made this process transparent, but the reality is often different.
As to a “single” version of the truth, and why this is a pipe dream, that would require a separate post. In short: depending on your perspective, and your data needs, different data are “real.” The classic and hackneyed example is different definitions for a customer that are in use across the company: marketing refers to prospects as customers, finance only counts those that receive invoices. Operations needs to process and service all individuals, whether some consider them customers, and others in the company may not.
Your context, the perspective that you own, determines what your “reality” is. It can be tempting to consider your “truth” a universal reality, even if you are aware of different business rules across different stakeholders. As a sign of data maturity, I have often seen business unit specific definitions. In finance it is quite common to add adjectives to line items like “revenue”, “sales”, etc. I would argue that a similar awareness is probably needed for (almost) all data entities. A finance or operations customer can be something different, and if you allow that added complexity, a lot of confusion can be resolved. The converse is discarding these differences as “data quality issues.”
Awareness of your business partner’s context will make discussion more meaningful and fruitful. Instead of frustration about non-matching, misaligned numbers, you can start a more useful exploration into the cracks of your corporate value chain. From my experience, in almost every case where the numbers of customers across business units didn’t line up, there were losses. Either leads that didn’t get followed up, overdue collections, late shipments, etc. Data quality can be the high road to value creation, rather than merely a pain in the rear end…