14 May 2017
Becoming a data-driven company: what’s not to like about it? Many organizations are embarking on a Big Data journey, and many of them are finding that it’s not so obvious how to create awareness of “data as an asset” throughout the organization. Embarking on a Data Governance plan implies driving change, which is often one of the hardest parts of these programs. How can we, as data professionals, make our lives easy(ier)?
First of all, and as ‘everybody’ notes, you need senior management sponsorship. Unless upper management endorses these efforts, you might as well stay home. Of course it is fairly easy to say you support a program. At the end of the day, people involved in the day-to-day work on Data Governance will (still) be expected to do most of this hard work.
Leading change in Data Governance programs, especially in the early stages, is best done through “information”: a compelling story, examples of the cost for data non-quality, convincing examples of how doing things in some “new” way will benefit the organization, etc. If you need to rely on senior management’s “stick-and-carrot”, you may be able to invoke some change, but it probably won’t last. There has to be a better way, and there is.
Data Governance programs can have many objectives. One of them typically is to instill a sense of ownership and decision rights with regards to data assets. Once you begin to treat data like other valuable (albeit tangible) assets the company owns, then many similar governance principles can be applied.
For example, when it comes to ownership, you want to make explicit who (person or committee) has the ultimate right to decide about forthcoming changes in IT systems. Your budget holder is a likely candidate system owner, but there can be others, too. Often, teams or committees have this responsibility, and then it helps to be explicit about this. Exactly who gets to make these decisions?
Besides ownership, you also want to make public who should be consulted with regards to planned changes (who has a right to provide input), and who should be informed after changes have been put in place. Data gets used and reused downstream from primary processes, and exactly this secondary usage creates so much value. Of course secondary data usage can also be a cause for concern, and serious headaches… For these reasons, the entire value chain that data flows through needs to be considered when making changes.
IT systems exist upstream, and largely independent from data usage, yet can be the source for a wide variety of analytical applications and reporting. When business owners make changes to these source systems, downstream data usage will be affected. Are stakeholders aware of all the different ways data are being reused downstream? Probably not. Do they know how much it costs to curate inaccurate source data? Doubtful. Maybe even more important: what are the costs when downstream business decisions are compromised? Quantifying these consequences goes a long way towards creating awareness about managing “data as an asset.”
Once I worked for a credit card company where errors in the application processing data trickled down to sub optimal assessment of credit risks. The way this works is that historical data about credit card applicants (along with their eventual credit history) are used to predict “likelihood to default” for future applications. You gather data about thousands of applications, and track their subsequent payment behavior. For every record you append a flag “defaulted on loan yes/no”, and then use that file to create a model that predicts likelihood to default.
The quality of these data has a material impact on the accuracy of the models you build. When a past applicant gets flagged as a home owner, but in reality he was renting (associated with higher credit risk), then those input data will render credit scoring models less accurate. But by how much? Is that really such a big deal? You won’t know, until you measure…
An example from my personal experience. At a credit card company I worked, we engaged in a wholesale scrubbing of such an application scoring database. Many thousands of applications were re-entered and the “correct” value as submitted on the initial application form was processed using manual re-entry. This allowed me to calculate how well the scorecard made predictions based on the “old” (dirty) data, and the cleaned version of the database. Next, I ran all processed application through these two scoring models to seen how many applications would get assigned to a different category (accept/reject). In the industry, that outcome is aptly named a “confusion matrix.”
In the credit card business you can estimate how much an application “costs” that gets rejected, when it really should have been approved: it equals the marketing expense made to attract this applicant. In the case of direct marketing these costs can be readily calculated. The more expensive misclassified category are applicants that were accepted, when they really should have been rejected. A defaulting credit card customer costs on average a few thousand dollars in write-off and collection costs.
In this manner I created a highly valid estimate of the business benefits of cleaning the database. Since this customer was reluctant to share their default rates (which have a linear impact on the total dollar value I calculated), I calculated three scenarios. I provided a “best” (fair) guess, and a pessimistic as well as an optimistic dollar value based on industry standard average default rates (reasonable lower and upper bounds for default rates). My calculation came to a whopping $7-11M benefits for the upcoming (first) year.
Since the numbers I came up with sounded unrealistic (too high), they had their internal credit risk expert redo my calculations (without telling me so, which is fine, btw). He, of course, did have access to some of the key parameters, and came to a staggering $10-15M. Not surprisingly, immediately a plan to improve the accuracy of data entry was put into place. From then on, every middle manager who came to work in the back-office was handed this report on his first day, to ensure the importance of accurate data entry stayed on everybody’s mind….
There’s a lesson in this case, maybe two. As the business owner later told me: “I was always aware that data quality issues were costly, and they are often a lot more costly than you might think.” But more importantly, I think, is another lesson I have learned over the years: if there is one language that every manager, in every business knows and understands, it is “dollars.” Or Euros, Pounds, Roupies, Whoopies, or Goopies, or what you have. Those numbers “stick.” People remember them forever, they seem the most powerful lever to drive change in awareness of “data as an asset.”