What is data quality, really?

Tom Breur

1 July 2016

The topic of investigating data quality as a formal, separate discipline is about 20 years old, now. Classic books like Redman’s (1997) “Data Quality for the Information Age” and English’ (1999) “Data Warehouse and Business Information Quality” have opened up discussions in many companies and settings whether data quality merits special and separate attention. Since then, a few dozen or so books have been written specifically about data quality. One of the main problems, though, seems to be that few people agree what data quality really is. Depending on whom you ask, you are likely to get a wide variety of answers and definitions.

When you ask an ETL programmer what data quality is, he will point to the number of conflicts in your audit dimensions, when he is merging disparate data sources. The front-end BI tool users might refer to the number of fields available and their richness to qualify and describe some unit of research interest, say a customer or order or shipment. Other analysts will refer to the predictive power that certain attributes hold for producing models with great lift. And still others will talk about sparsely populated fields with too many n/a or missing values.

What appears to be missing at this stage of maturity in our profession is an overarching framework to specify what aspect of data quality we are talking about, and where in the BI value stream it pertains. Until we resolve this confusion, most conversations about data quality are likely to remain, well, pretty low quality… J

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s