20 September 2016
Data Scientist is a term that DJ Patil coined in 2008, and that is now called “the sexiest job of the 21st century.” Apparently, when Hal Varian (Chief Scientist at Google) coined “statistician” the sexiest job of the 21st century, that wasn’t good enough. I never thought “statistician” sounded too dusty, but if you look at the number of job ads on LinkedIn, clearly “data scientist” is the winner, nowadays. Since so many companies are seeking to gain competitive advantage by leveraging their data better than others, hiring “the right” data scientists clearly is a key success factor.
So what makes a good data scientist? There are two components, data and science, and I’ll give my take on both. Clearly you need to know how to handle “data” – easier said than done. And then there is the “science” part, which implies applying scientific methods in a business context. Since the “data” part has been evolving so quickly, the skills you are looking for depend –at least in part– on the technology you are using. Legacy, SQL, NoSQL or maybe you’ve transitioned to serverless architecture already. “Science” refers to valid and objective approaches for deducing information from data. Besides knowledge of statistics, a firm grounding in experimental methodological is key.
How important are programming skills? The answer to that question depends on the “maturity” of the infrastructure. I put “maturity” in quotation marks, because it depends on how far along you have come with regards to data modeling, as well as the kind of platform(s) you are leveraging. Ideally, in a SQL based, dimensional model (a so-called “mature” environment), programming requirements are limited. It’s a matter of pulling the right data set, and getting on with the analytics, often some kind of descriptive or predictive modeling exercise. The reality is, of course, that many companies still have a long way to go when it comes to the data modeling part…
I refer to NoSQL solutions as “less mature”, of course aware that some of the internet/corporate powerhouses have been using that technology for a while, and are very successful with it. The majority of implementations, however, have few people with ample hands-on experience, they use this less expensive technology to keep up with dramatic change and speed of development. The flipside of that is that their data scientists will have less documentation available to them and less fleshed out dimensional models. That implies a lot of programming, so for those companies programming skills are clearly a key to success.
As to the “science” skills, there are still many companies that wrongfully associate “scientific” with not-so-practical, academic, pejorative adjectives. As if there is a dichotomy between practical and scientific. “There is nothing as practical as a good theory” has been attributed to Kurt Lewin, and indeed it is a cornerstone of Deming’s approach. Hard to argue with the business success that application of the scientific method has brought the Japanese! Data science is very much a Plan-Do-Check-Act cyclical activity. Analytics may give insights, but turning those insights into business value requires acting, and hence testing of your hypotheses.
In the “new” world of Big Data, where data are plentiful, you need a firm understanding of experimental methodology. It is not just about data, it is (also) about finding and designing efficient ways to test hypotheses, and you will often be doing this with field tests. Many smart websites are renown for testing pricing and communication variations for business advantage. That is how you hone in on the optimal value proposition, and set your prices to be as competitive as possible, yet high enough for a healthy profit. The fine art of experimentation is undervalued, imho, yet is critical in the Check-Act parts of Deming’s cycle. Data scientists need to “own” this, and it’s a skill I have often found wanting with aspiring data scientists.
Business acumen and communication skills, lastly, are known to be crucial for success. Unless you can work effectively in teams, explain advanced analytical concepts to business stakeholders in a persuasive manner (!), analysis remains, well, analysis. By deploying models, and working with business line owners to put your field tests in action, that is where the rubber meets the road, and where genuine business value gets delivered. There are few things more compelling to senior business people than empirical findings from valid field experiments that show what customers are willing to spend their money on, and what not!