Managing Data Science

Tom Breur

9 September 2018

Hal Varian, Chief Economist at Google, is said to have proclaimed that statistician would become the sexiest job in the 21st century. Harvard Business Review published an article by Thomas Davenport and DJ Patil, who ‘ran’ with that quote and rebranded it to data scientist being the sexiest job in the 21st century. Based on the historical number of searches for “data science” on Google in the last 10 years, it is clear the popularity of the term has been on the rise:

Google_trends_Data_Science_10_years

Several people, myself included, have pointed out there is considerable ambiguity around the term “data science”: nobody seems able to clearly delineate its boundaries. This has a few side effects. For one thing, adjacent fields make claims on the term, further clouding the issue. What I would like to draw attention to, is that diluting the meaning of “data science” causes its boundaries to grow and grow, until any data related topics might be included. In an earlier post I noted how “data science” often seems little more than ad hoc ETL by so-called citizen data scientists. Not very glamorous, but fulfilling an immediate need, and at the fore front of data discovery.

My personal perspective on data science is that it ought to encompass more than ‘merely’ pulling together disparate data sources (as noble as those efforts are). My friend and colleague, Dutch thought leader Ronald Damhof is known as the inventor of a 4-quadrant data management model. He also created the following remarkably informative diagram to illustrate some interesting characteristics when you map a range of activities that are loosely referred to as “data science”:

MDS_20180811

There are several dimensions worth noting in this diagram. On the top we tend to see “simple” queries, getting increasingly “complex” towards the bottom. On the top, workloads usually fall in the purview of IT, whereas towards the bottom business stakeholders tend to drive development. On top you tend to see static workloads, while at the bottom work tends to be more interactive. At the top of the inverted pyramid are usually far greater numbers of users, relative to the bottom.

For the purpose of this article, my focus is on the last vertical arrow: efficiency versus effectiveness. This relates to a continuum from focus on process improvement (“efficiency”) to focus on innovation (“effectiveness”). When you leverage data science to drive innovation, this is an excellent example of pursuing effectiveness, rather than efficiency. When you operate inside existing processes, you can work on efficiency. When you go outside native business processes, your aim is to improve effectiveness by innovating existing value streams.

Not all data science work needs to be aimed at innovation, though. For example, when you screen credit card applications for their likelihood to default with Machine Learning (ML) methods, the nature of the business is identical regardless of whether ML or human underwriters decide who will be granted credit. Hence that would be a data science application geared to improving efficiency.

Improving efficiency implies doing more or less the same as what you were already doing, albeit in a more cost-effective manner. In Lean parlance it would be referred to as driving out waste, or Muda. Improving efficiency can, and usually is done by stepwise, piecemeal improvement.

Improving effectiveness, OTOH, has a high R&D component to it: high risk and high reward. Not all your attempts will be successful, but when they are, you win big. The high risk involved in pursuit of “effectiveness” implies that you need to scrutinize your investments. As resources diverge, you periodically have to kill off the ‘promising’ attempts that won’t bear fruit – at least not soon enough. That is where the pendulum swings between “trying a promising avenue” to “shedding dead-end street efforts.” Striking that balance is really, really hard.

Data science may well be the sexiest job in the 21st century. But what is much less widely broadcasted, is that managing data science projects is probably the most important, as well as least understood job of the 21st century.

Advertisements

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s