Defaulting To Data Quality

Tom Breur

17 December 2017

In the new era of “Big Data” companies are waking up to the need to govern all their data assets as carefully and meticulously as any other corporate asset. Given that data are intangible, this is often a new skill. Especially for companies that are relatively new to the analytics space, these change trajectories often lead them through uncharted waters.

The way these journeys have evolved for companies I have worked with over the past few decades, has led me to apply a familiar framework, customized specifically to this particular change process. This framework describes stages that I have seen companies go through when they embark on data quality improvement efforts. But by all means, use “data quality” in a generic sense: extracting as much value as possible from all corporate information sources. As Jerry Weinberg has pointed out: quality is value to some person – in this case professionals throughout the corporation. My proposed framework consists of a classic and well-known 2 * 2 matrix tabulating awareness versus competence.

It turns out, there is a natural progression: moving from the bottom left quadrant via the top left corner to the top right, and then on to the bottom right. This also happens to be a remarkably linear succession of states that organizations transition through. In fact, by deliberately following these sequential steps, you drive sustainable improvement. Along the way, in successive phases, different actions are required as indicated by the arrows 1, 2, and 3 in Figure 1.


Many companies have data quality issues that they are largely unaware of. The financial losses as a result of these problems remain “unconscious.” What you tend to see is that most people within the organization are unfamiliar with their data quality issues, and the (downstream) costs they are incurring as a result of poor quality data. This could be inaccuracies in data, unavailable data, trust issues with existing data (that constantly require checking and double checking and maybe curating the data), or poorly documented meta data that lead to misunderstandings, errors, delays, and avoidable rework. This is our bottom left quadrant. The way to get “out” of this place, is by information (Arrow 1 in Figure 1).

The action to take here is to tell as many people as possible about the data quality issues you experience. Preferably also quantify what the costs are associated with the current status quo (English, 1999). Money talks. By providing this information and spreading it throughout the organization, you move an organization from the “unaware” to the “aware” state.

Senior management is certainly one of the prime target groups that should be informed about your data quality problems. Translating the consequences of data non-quality into the one dimension that every manager in every industry understands so well, namely dollars, is a great way to draw their attention. And to make your point “stick.”

Monetizing a data quality business case requires hard work and making assumptions to finalize the cost calculations. Don’t forget to include the model you used to calculate costs as a result of poor data quality, along with the assumptions being made. Be as transparent and clear as you can be.

Maybe you want to present your findings with a certain confidence interval, to acknowledge the uncertainty in your calculations. But make sure “a” number gets calculated! Financials have a tendency to “stick”, to be remembered quite well. As if they were the “ultimate” summary of the existing status quo.

After you feel “awareness raising” has been sufficiently addressed, the second transition you will want to go through is moving from awareness/lack of competence to awareness/competence (Arrow 2 in Figure 1). There are two distinct ways to go about this, that may well be employed in parallel.

One way you get there is through education. Another way is by enhancing your capabilities through improvement of your “infrastructure” in support of data quality. The latter means enabling and empowering staff by providing them tools and technology in support of data quality assessment and reporting.

With education (or training), your objective is to train people in practices that will prevent poor data from entering your systems. You want to grow their knowledge to help them better diagnose and solve existing data quality problems. These could be preventive, as well as contingent measures.

By improving capabilities, you move from the left top to the right top quadrant by providing tools and infrastructure in support of data quality. Tools could obviously be dedicated data quality software, but more generically this could be any technical resources that improve your capabilities. A data quality scorecard (as described in Maydanchik’s 2007 book) can flag “suspect” records that appear to violate business rules, which would obviously belong here, too.

Upstream, strengthening your capabilities could be through training data-entry staff in best practices, like always having “four eyes” check manual entry. Maybe you want to reconfirm the importance and cost associated with poor quality (that was raised to their attention in the previous phase), in particular as a standard practice in the induction program for new trainees. But it could also mean simply designing better user interfaces that enforce and facilitate higher quality data-entry.

Downstream, strengthening your capabilities could mean showing how data warehouse staff can track and establish data quality, preventing poor quality data form entering your corporate reporting environment. This might be, for instance, putting deduplication technology in place that leads to fewer duplicate records from the ETL (Extract – Transform – Load) process. Or, systematically reporting the number of errors occurring in your audit dimensions (Kimball & Caserta, 2004). A Novelty Detector might be another tool you want to have in place, etc.

After strengthening capabilities, both your “people” and “infrastructure” have come to fruition. The third stage in this voyage (top right to bottom right transition) is meant to solidify and perpetuate new and improved working practices. This phase should ensure that the habit of building quality in becomes ingrained in the organization. Producing high quality data should become the norm (Crosby, 1980). It’s as if you’re reprogramming corporate DNA. You accomplish this by restructuring accountabilities (Arrow 3 in Figure 1).

Issues you will be facing in this third phase are things like organizational alignment and performance targets. Fundamentally, your objective here is to alter systemic causes that have allowed mediocre data quality to persist. You do this by changing the structure of the organization.

One of the classic and hackneyed examples is that if you were rewarding data entry staff for speed, you will now need to add performance objectives that also reward staff for accuracy. Without this explicit shift in performance objectives, you put them in an unfair quandary. When they are overloaded with work, and the only way to deliver sufficient quality would be by missing their productivity targets, the performance objectives contribute to poor data quality. Clearly undesirable.

Misalignment between departments occurs when the people suffering from lack of data quality are unable to influence resource allocation where data quality should be produced or guarded (upstream). For instance, marketing often incurs the costs as a result of sloppy processing by data-entry staff. They rely on lists for marketing purposes that were produced by operations or back-office staff.

Rational allocation of resources that transcends business units, is the hallmark of a more “mature” organization. But very often, people are not aware of these departmental misalignments. And even if people are aware of the problems, they might not be (feel) empowered to do something about them.

The problem holder is the person “suffering” from poor data quality, in this case marketing. The problem owner controls the resources needed to resolve the problem. In this case a manager of the department that is responsible for data entry. Organizational alignment is the result of bringing problem holder and problem owner as close together as possible. Any time they drift apart, you’re at risk. Organizational friction invariably leads to waste across the value chain. This loss often manifests itself as a “data quality” issue, but that is typically “merely” a symptom of some other “people” (organizational) problem.

After you have gone full circle, new and improved levels of data quality will have become the norm. At this stage, there are now controls in place, and awareness about the importance of quality operations continues to grow across the organization. Over time, the process of data quality improvement becomes increasingly familiar, and you may attempt to run several efforts in parallel. Be aware that each phase builds on top of the previous ones, in iterative manner, so it is practically impossible to “skip” any steps.



In summary, some companies know they have costly data quality issues, and some may not. However, everybody prefers good quality data. Hard to argue with that. The question for many organizations is how to reach their data quality happyland. Every company is different. To provide guidance on this journey, the framework I have laid out here helps identify where you are, and how to define the corresponding steps that are appropriate on your journey. The order in these steps may not be set in stone, but the underlying dependencies help determine what to do next and how to assess progress.

There is no point in trying to grow capabilities and provide extensive training before awareness about the need to improve quality levels is in place. This requires a candid and thorough assessment of your current levels of quality. Akin to Deming’s “inspect and adapt” mantra that has become a cornerstone of Lean and Agile methods. Organizations typically begin with awareness creation first, in an attempt to get attention for the data quality problems at hand.

After awareness has been raised, the next step is developing skills and capabilities. This can be training staff, ranging from front-line data-entry to backend data warehouse ETL specialists. But this phase also includes improving user interfaces to enable better data-entry, or supporting data warehouse staff specialist technology (data cleansing tools, etc.). Any effort that raises the organizations capabilities can fit in here.

Finally, to make change sustainable, the root causes for data quality problems need to be considered. These typically lie in poorly aligned objectives. The quality literature offers several approaches for root cause analysis, e.g.: Dettmer (2007). What is essential for this step is that some structural (“systemic”) change in how the work gets organized leads to improved business alignment. That usually involves changing accountabilities, and almost always implies short-circuiting some information flows.

Another type of organizational change we have seen here is appointing data quality stewards, and equipping them with the tools needed to do an effective job. These tools can be Statistical Process Control (SPC) software, and/or analytic and data visualization capabilities to persistently delve into root causes and frequency of occurrence of different kinds of glitches (which inevitably have been happening, albeit at progressive lower rates).

In line with this move, companies should develop explicit data strategies that turn a vision like “data (quality) is important” into a set of actions and milestones. “Data is important” is simply too ambiguous to inspire people to act differently. They need to know which data are particularly valuable (get reused the most), and –like with any strategic initiative– they need to know what this means for them personally.

“Data are a strategic asset”, “Data is the new gold” are tag lines that signify a new, or changed strategy. But strategy should not be something “out there” that belongs to other people (senior management). The essence of Data Governance implementation is translating a department (or corporate) strategy into concrete steps to take at the individual level. That is where the rubber meets the road.

Establishing a good working relation between data quality stewards and the work (departments) they oversee, sends a clear message to the organization. You also need to make explicit what their role is relative to a data governance board.

This way you communicate (with acts rather than words) that poor quality is a) no longer acceptable, and b) although day-to-day work may be delegated, senior management takes responsibility and ownership for restructuring the organization in such a way as to reflect this new vision on data quality.

This transformation loop can be summarized as inform-educate-transform, where each phase builds upon the previous. After data non-quality has been driven out of one process, the organization will learn to signal similar opportunities in other processes. Management “owns” this practice and through leading by example. This will guide the quest for ensuring data quality becomes the norm (rather than an exception). That is how you make sure that going forward data quality will be the default state, rather than an ambition to pursue.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s