Is the Data Scientist role another buzzword to increase software sales (think the 1990s and ‘data mining’) or is it real? The drivers for the data scientist role look to be three: (1) increased complexity & ambiguity of data structures, (2) increased velocity with which data enters organizations and (3) an increased need for faster, more targeted, actionable analytics. An increase in data volumes is not a driver. Data volumes have increased steadily for decades and hardware vendors have never had problems meeting demand. The analytics themselves don’t seem to be any different, but maybe more complex due to the drivers listed above. What about the skill sets of this ‘data scientist’? Here’s a look at attribute combinations (proficiencies & skills) that might separate a ‘data scientist’ from more traditional roles and a discussion of what might be gained from combining attributes or risked from failing to have certain combinations.
Fundamentally, a data scientist is an inquisitive personality who thrives on asking questions, solving puzzles and finding hidden relationships — someone who is natively and enthusiastically curious. A big plus is an uncanny ability to capture serendipity when it magically arrives. Of course enthusiastic curiosity and an ability to embrace serendipity don’t make data scientists. But combine enthusiastic curiosity with a vigorous set of disciplines, skills and proficiencies and magic happens! Below are traditional skills and proficiencies that when combined along with the highly curios personality result in The Data Scientist.
KNOWLEDGE OF ANALYTICS ( A ) — if there is any hype to the role of data scientist it’s here. To be clear, a statistician is not necessarily a data scientist and a data scientist need not necessarily be a statistician. Analytics are critical to organizations, but analytic requirements differ dramatically by industry, company, culture as well as by economic conditions. Knowledge of analytics implies: 1) an ability to perform analytics, 2) interpret analytic results and 3) make appropriate business recommendations based on analytics. Common disciplines for analytics are Mathematics, Statistics and Econometrics. Common tool sets are SAS and SPSS. Other disciplines and tool sets also fall into the mix.
BUSINESS COMMUNICATION SKILLS ( B ) — an ability to understand others and make oneself understood. Everyone gets this, few are good at it. This includes listening, reading, writing, speaking. Following instructions and presenting results are at the top. Emphasis is on listening first.
DATA MANAGEMENT SKILLS ( C ) – an individual referred to as a ‘data miner’ or a ‘developer’. The core competency is the ability to manage data using query tools or scripting languages; developers will have additional skills in more structured programming languages. Analysts typically spend about 70% of their time preparing data for analytics, i.e., creating analytical data sets, before any analytics is applied. Long ago this process was called ‘data prep’ using a tool like Base SAS. In the mid 1990s came the buzzword ‘data mining’. Less than a decade later analysts began leveraging relational databases, running data management routines inside the database using SQL, especially handy for heavy lifting, such as joins & aggregations. In all cases some data management language is required; there is a lexicon, syntax, grammar, a structure and all sorts of unforgiving rules. Just as poorly written emails create confusion or worse, a poorly structured query can produce bad results and even dim the lights in the data center.
KNOWLEDGE OF DATA ARCHITECTURES (D) – rarely mentioned in the context of ‘data scientist’. Just as statisticians are not necessarily data scientists and vice versa, so to a data architect is not a data scientist and a data scientist need not be a data architect. But knowledge of data architectures is a big plus for a data scientist. Such knowledge includes at least an awareness of: physical and logical data models, metadata, data dictionaries, data integration, schemas, semantic layers, ETL processes, etc. Knowledge should also extend to governance, policies, technology plans, service level agreements, etc.
Having one or a combination of any two of these attributes does not make a data scientist. In fact, having any two only describes established roles such as: business analyst, data architect, data miner or developer – roles that have existed for years. A data scientist, after all, is someone with skills beyond those listed — otherwise a data scientist is just hype.
Having a combination of three attributes, depending also on organizational requirements, might very well describe the role of Data Scientist. Consider the following examples.
- Someone with great business communication and data management skills along with deep analytic knowledge (A,B,C) is more than a business analyst or data miner. This person can design and build analytical data sets, can run analytics, can communicate well with business and IT, can present results and can also be involved in the strategy of analytic design. This person belongs on the business side and in front of the customer.
- Someone with great business communication and data management skills along with a deep knowledge of data architectures ( C,D,B ). This person might – wrongly — not be considered a data scientist because of the lack of analytics knowledge. This person will not only complement the individual in the previous example (A, B, C), but will be a great independent resource as well. This person can not only design and build analytical data sets, but entire in-database analytical semantic layers. This person can optimize code and algorithms for speed, efficiency and accuracy; he/she can design and build data schemas optimized for particular analytics. This individual can write and implement User Defined Functions, custom triggers and machine learning algorithms. This person can also be involved in the strategic & tactical discussions surrounding analytics. Put this person in IT and in front of the customer.
Someone with a deep knowledge of data architectures and analytics with strong data management skills (C, D, A). This person is probably a ‘loner coder’ and happiest in the back room. Put this person either on the business side or in IT depending on his/her data management skills. If this person is a SAS, SQL data miner — business; if a Java, C++, SQL developer — IT. This person might not be involved in strategic discussions or spend a lot of time in front of the customer. However, their knowledge of architectures and analytics and their data management skills make them ideal for a number of tasks: developing data structures, analytical data sets, analytic semantic layers; generating optimized code, triggers, UDFs, machine learning algorithms; sourcing disparate data and putting structure to data; designing and running analytics and interpreting results. Bottom line, this person can do a lot! And it wouldn’t be unusual to have more than one and placed in different areas of the organization.
Someone with a deep knowledge of data architectures and analytics with strong communication skills (B, D, A). This is a doubtful combination and raises some red flags. First, how is it that an individual has deep knowledge of both data architecture and analytics without the data management skills to build analytical data sets? Doesn’t seem realistic. Second, with no data management skills this analyst needs to be spoon fed data, i.e., someone needs to provide the analyst with data dumps or the analyst uses a reporting tool to generate data dumps. In either case, this analyst will only vaguely understand what transformations, joins or aggregations have been performed and will be clueless as to the quality and integrity of the data.
If a combination of any three attributes makes a data scientist, then having all four raises the level again – to a scientist and artist? – guru? Here’s what might separate this scientist/artist.
Not bound by a single analytical discipline or analytical tool. If an analytic methodology exceeds the limits of one tool, the scientist/artist adapts easily to another tool that meets the requirements. If arithmetic is a more appropriate than statistics, the scientist/artist won’t favor more complex methodologies and knows to never over complicate analytics for the sake of ‘sexy’ analytics.
Able to communicate at all levels of the organization – vertically from C-Level to line staff and horizontally from marketing to IT. The scientist/artist takes direction from organization visionaries then in return, presents results and defends positions with clarity, tact and authority. He/she can discuss strategies and methodologies in a way that is appropriate to the audience.
Not bound by any single data source or data management tool; does not need to be ‘spoon fed’ data. Regardless of data volumes, data structures or data source, the scientist/artist can incorporate new data streams into analytical data sets using SQL, scripting languages, text editors, whatever works. He/she does not dependent on IT (or reporting tools) to generate data dumps – a major source of errors.
Able to incorporate analytical and data management processes directly into the production data environment – this might include machine learning algorithms, UDFs, triggers, etc. For example, an analyst might create the same analytical data set and perform the same analytics week after week; whereas, the scientist/artist would create the analytical data set inside the production warehouse as a view – once — and perform the analytics at will without ever having to refresh or rebuild the analytical data set. He/she might create not only analytical data sets, but entire analytical semantic layers available to many business analysts.
Whatever an organization’s current analytical requirements are the scientist/artist is able to take it to the next analytics level. The forward looking scientist/artist will understand corporate visions, strategies, cultures, tactics, etc. This individual not only understands marketing, financial and operational reporting and analytical requirements, but can also formulate analytic strategies. Mix great skills with enthusiastic curiosity and the analytical tasks become rewarding challenges and indeed, magic happens.