7 October 2018
IT analyst firm Gartner points to the rise of “citizen data scientists”: a new generation of professionals with minimal technical background that rely mostly on contemporary GUI based tools to do their job, with limited or no support from IT. This kind of work appears to require very little technical expertise. From Gartner’s perspective, that is exactly what makes them “citizen users”, a qualification I abhor, but that’s a topic for a different blog.
Because so-called citizen data scientists often work (almost) exclusively with GUI based tools that require no coding at all, more than once people have asked me: “Can you become a data scientist without any coding experience?”
Most data science professionals feel that at the very least a modicum of programming skills is required to be versatile and flexible enough. Our daily tasks require you to deal with a wide array of potential data extraction scenarios. Once the data come to you clean and crisp (after some 80-90% of work has been done, typically…), you can get quite far with GUI based tools alone. But that would be a horribly limiting condition.
The reality is that data munging and web scraping, and various other “upstream” activities can be much more efficient and robust if you can code at least parts of your data pipeline. More efficient because you can get your results quicker, and more robust because repetitive point-and-click tasks tend to be error prone. The reason for that is that ‘fixing’ data, and structuring or preparing data for analysis, is bound to surface plenty of repetitive tasks. And that is where coding skills become almost (!) essential.
Data, the “ore” for data mining, comes from many possible sources. Much has been said about data lakes, but if nothing else, they are meant to be a sort of ‘holistic’ source for data – which typically suits the needs for data scientists. If you are lucky these data have been catalogued, and maybe even partially cleaned. But still a fair amount of work is likely to remain. And for those of us who have access to an EDW – let me remind you that it stands for Enterprise Data Warehouse. The E, although short for “Enterprise”, does emphatically not mean it contains “Everything.” That is not what E stands for. And consequently, some of the more interesting and possibly innovative work will spawn from blending in data not (yet) contained in the EDW. In almost all of those scenarios, munging data will greatly benefit from at least some coding skills.
Now let’s return to my central question: “How much technical skills or programming expertise is required, to become a data scientist?” The answer to that question, ultimately, depends on your ambition. It also depends on the scope of the value chain you have in mind for a “data scientist.” As I have written earlier, I believe that the job title data scientist should be reserved to those who contribute the scientific method and hold capacity to surface findings that transcend the current level of thinking. Insights that hold potential to innovate current business practices. There is a place for data analysts who ‘just’ report on the situation “as is”, and I propose distinguishing that from data science.
How much you can get done without any coding skills is obviously a function of at least two things: how clean and structured your supply of “ore” is, and how easily you can tap into IT/technical support when your task at hand calls for it. Now that many business people have high hopes of “self-service BI”, there may be less awareness how much ‘plumbing’ needs to happen to create data pipelines for business applications.
That being said, if you don’t have the ability to code, this also confines citizen data scientists to what you can see and do within the realm of GUI based tools. I don’t like splitting hairs, but my conclusion is that you can become an outstanding data analyst even if you have never written a line of code in your life. But to become a data scientist, in the perspective I hold, at least a modicum of coding skills are a “must have” to navigate the amorphous boundaries of our profession.