The Ecosystem of Data Jobs – Making sense of the Data Job Market

The landscape of jobs related to data has been thriving for the past years. The dynamics of innovation have driven the fragmentation and specialization in the data science job industry. Like a healthy jungle, its lushness and fertility come from the interaction of different organisms that grow without rigidity, taking advantage of their interconnectedness and creating networks of cooperation.

While this flexibility has proven – and still is – very productive, the data science job market is mature enough to give clarity about the definitions, required skills, and job descriptions of different data jobs. This clarity is necessary to harmonize the expectations of both data job candidates and their employers. Navigating this jungle is not easy since terms like “Data Science” have come to refer to virtually any quantitative job. However, our experience in understanding what employers need and finding the right mix of skills in candidates gives us the domain knowledge necessary to guide you.

Because we want to dive deep into the topic, we created this as a two-part series (you are now reading part 1). The journey begins with an exploration of the causes of the booming of the data field. Then, we focus on exploring the fragmentation of the data science landscape into different profiles:

In the second part, we zoom into one particular data job profile: the “Data Scientist”. We dive into a deep definition of Data Science and explore the different job profiles it has evolved into. The advances and success in this field have encouraged specialization into new job titles such as Deep-Learning Engineer, Natural Language Processing (NLP) Engineer and Computer Vision Engineers.

If you are ready to take the leap into the data job industry, here you can find our available positions.

Data jobs – What has driven their explosion?

The possibility to apply models to derive information from data has been tied, from the very beginning, with the development of computers. As a historical anecdote, the first general computer, the ENIAC (Electronic Numerical Integrator and Computer) – or “Giant Brain” as was later called by the press – was developed in the heat of WWII. President Roosevelt ordered the creation of the Manhattan Project. The race to develop the first atomic bomb was on and some of the greatest minds of the time were on board – Robert Oppenheimer, John von Neumann, Enrico Fermi, to name just a few -.The scientist struggled with the necessary calculations to predict the complex reactions that the bomb would have. What if these chain reactions were not analytically calculated, but instead simulated by describing their probabilistic distributions? But the calculations that were needed were impossible for humans. This newly unveiled Giant Brain needed only 30 seconds for what took a human 20 hours to calculate. The first real application of Monte Carlo Methods was tested and applied in the very first general computer ever developed. This opened a new chapter in history.

Here a very interesting recount about thebeginning of Monte Carlo Methods and the role of ENIAC during the intense timesof the Manhattan Project.

Since the 90s, computers entered a new stage of development marked by democratization and massification. Without this expansion, growth in data jobs would have been impossible. To give you an idea, the World Bank estimates that by the end of 2018, almost half the global population (49.7%) had access to the internet. In line with proportions of the developed countries, in the United States 87% had access to internet by this time.

Computer’s computing power has been doubling every two years for the past decades – phenomenon predicted by Moore in the sixties -. Technological innovation, together with the development of data analyzing capabilities, has been linked to this progress. However, we are reaching the size limits that transistors can have. Does this mean stagnation? A new possible source of immense growth in computer power appears in the horizon. Just recently, a history changing announcement was made: Google has achieved “quantum supremacy”. This means that for the very first time, a quantum computer was able to make accurate calculations faster than the most advanced normal computer in the world. The new prediction about how fast computing power is expected to grow with quantum computers is a double exponential rate: Nevens Law.

When quantum computing achieves an analogous state of democratization – meaning that industries and population in general have access to this technology – the landscape of data jobs will be revolutionized. For the data job industry, this means that old systems will be rendered useless. New solutions will be needed very urgently. With this type of power, Data Scientists will be able to expand their boundaries and implement new models, now impossible due to constraints in calculation capabilities.

One more factor has played a key role for the blooming of data jobs. The digitalization of our lives has brought a dramatic increase in data, the valued raw material of any successful model. Not only has the amount of data increased, but its diversity and specificity. The most exciting innovations in the data jobs industry come from exploiting data in the form of text, images, audio and video. This was unthinkable some years ago.

On the human front, these rapid developments have been accompanied by analogous expansions of the data science skillset. More complex knowledge is demanded from companies that has led to strategy of fragmentation in the data job industry. It is more efficient to specialize in particular ways of adding value from data and then teaming up. For example, one profile will specialize in finding the most efficient and accurate ways of extracting data, another one will design an optimal data warehouse, some other will specialize in analyzing historic data while yet another profile will do predictive analysis. These functions are overlapping and inherently dynamic. The benefit with this approach is that, by incentivizing cooperation and exchange, an ecosystem is created where the whole is more than the sum of its parts.

In a nutshell, the main causes of the explosion of data science jobs are:

Growth in the availability of computers and their computing power, have extended human capabilities, and therefore allowed the development of very sophisticated applications of data analysis. This, in turn, has incentive the creation of cooperative systems in which specialization of data skills and data related jobs is an advantage.
Given its proven efficacy, the demand for Data Science skills has exploded. The advantage that Data Science provide are becoming industry standards.
These advancements have been fed by a dramatic increase in the amount and diversity of information available. This in turn has allowed models to achieve unthinkable levels of precision.

What does a Data Scientist do within this data job ecosystem?

Data Scientists main focus is to develop models to very accurately predict some outcome. To do this, they must have very strong analytical skills, deep knowledge of Statistics (hypothesis testing, properties of distributions, regression techniques) as well as advanced programming techniques. Their models have proved to have very widespread applications. Some examples: customer experience (personalization and Recommender Systems), prediction of financial behavior (credit scoring), image recognition, speech recognition, information retrieval, customer support (creation of bots or personal assistants), and much more. Depending on their focus, they use different types of data as raw material for their models. The most innovative applications are coming from the possibility of extracting value from data formats like text, audio and video.

In the second part of this series, we develop a deep definition of how the Data Science profile has evolved and some of the most important directions it is taking.

Data Science skills

Math, in particular Probability and Statistics
Machine learning techniques (supervised & unsupervised)
Knowledge of SQL & NoSQL Databases
Programming languages: Python (libraries like Tensorflow, Pandas, Matplotlib and Numpy); R (packages like tidiverse, ggplot2, CARET)
A big plus for Data Scientists is having the skills to work with Big Data (Scala, Hadoop, Spark, Cassandra)

Data Science Job Description & Roles

Making exploratory analysis
Identifying patterns in data and making predictions
Applying state-of-the-art machine learning techniques to optimize the field of application
Model creation (deciding which model is best, feature-engineering)
Model deployment (apply the model into the real-life application)
Model optimization (when new information comes in, optimize the model to make the predictions more accurate)
Continuous communication with stakeholders as well as Data Engineers, IT and Software-Engineers

Data Analysis vs Data Science – What does a data analyst do?

The difference between a Data Analyst and a Data Scientist creates a lot of confusion. This is not surprising since both data jobs have overlapping responsibilities and domains. The main differences between a Data Scientist and a Data Analyst are the focus and the methods that each one uses to extract value from data. The skills of a Data Analyst focus on looking backward, to historical information, to create business strategies that ensure advantage. They use their tools to understand the status quo. Data Scientist on the other hand tend to look forward. Their skillset is focused on applying models to hypothesize and predict what is going to happen. Another big difference between Data Scientists and Data Analysts is the type of data they use. The methods used by Data Analysts tend to restrict to structured data.

Data Analyst Skills

Proficiency in Excel (R or SAS are a big plus)
Experience with BI-Tools
Communication of findings in a clear manner
Data visualization: Tableau or R’s packages ggplot2 and Shiny
Strong domain knowledge
Depends on the focus of the job: Customer analysis (customer segmentation, conjoint analysis), Social Media analysis, Google Analytics

Data Analyst Job Description & Roles

Making exploratory analysis, mining and wrangling data to find patterns and valuable insights
Understanding the stories that historic data tell and use them to find ways
Understanding all aspects of the customer (characteristics, lifetime value, segments, satisfaction, preferences)
Creating ways to communicate insights most clearly and succinctly (data visualization)
Continuous communication with various stakeholders
Developing reports and Key Performance Indicators (KPIs)

What does a business analyst do?

Business analysts are very important figures within a company because they use data-driven analysis to steer the direction and the optimization of inner processes. For the role of a Business Analyst, it is fundamental to know the specificities of the domain. This, together with strong data skills, should give a company the edge. There are a lot of responsibilities and areas that fall within the Business Analysts umbrella: 1) Optimization of workflows within the company (for example through standardization) 2) Assessment and development of business models that identify fields of opportunity, new market approaches, and company’s policies; 3) Together with IT, identification of technical requirements; 4) Together with Finance, budgeting and forecasting and development of reports and KPIs. According to their priorities and inner structure, each company has a different way of approaching and dividing the roles of a Business Analyst.

Business Analyst Skills

Deep knowledge in Management, Business, IT and Finance
Domain expertise in fundamental
Proficiency in Excel and in BI-Tools (not just being able to use them but also to implement them)

Business Analyst Job Description & Roles

Creation of business models (identifying opportunities for growth, improvement and potential or actual problems)
Guiding a company’s integration with technology
Analyze the state a company and propose data-driven recommendations to optimize processes
Implementation of systems for Business Intelligence (BI)
Development of KPI, reporting and assisting with financial modeling and budgeting
Continuous communication with IT and Finance Department

What does a Data Engineer do?

As the value of data has become more evident and as the Data Science job market has become more mature, companies have realized the importance of quality of data and the importance that good management of data has. The most sophisticated methods suffer from bad quality data. More and more diverse data is produced and made available, so the methods to extract and handle it have developed as well. These phenomena have created a niche in the data job industry: the Data Engineer. This data job profile takes skills and knowledge from Software Engineering and applies them to data processes.

The difference between Data Engineer vs Data Scientist is that the former implements the framework and process that will extract, transform and deliver data to where it is needed. Data Scientist on the other hand, take this data and use it as inputs for their models. Since companies deal with large quantities of data, one of the most important skills of a Data Engineer is the capacity to analyze these data processes to optimize them and found possible sources of problems or inefficiencies. Data Engineers are also in charge of ensuring the quality and reliability of the data they deliver. With this raw material, data scientists and data analysts – and other departments as well – can perform high-quality analysis.

Data Engineer Skills

Deep knowledge of database models and ETL
Tools that allow the handling of Big Data (Spark, Hadoop)
Knowledge of SQL & NoSQL Databases
Advanced programming skills on relevant languages like Scala, Java, Python, and Lua
Deep understanding and experience in handling different types of data (in particular structured and unstructured data)
Experience applying root cause analysis to optimize both internal and external data processes

Data Engineer Job Description & Roles

Develop and build data architectures (data sets and large-scale processing systems) as well as data pipelines.
Maintenance and management of data processes that will ensure optimal extraction, delivery, and transformation of data.
Feed data scientists and data analysts with high-quality data that allows them to apply state-of-the-art techniques.
Look for new opportunities to acquire relevant data.
Ensure the security of data (and in the case of international companies, develop the data warehouses that adhere to data regulations).

What is a Data Architect?

After reading the data job profile from the Data Engineer, it is very reasonable to ask, what is the difference between a Data Engineer vs Data Architect. As with all of these profiles, there is some overlapping, but the main difference between them is that the Data Architect is much more focused on designing the company data management framework. Think of their difference like the difference between an Architect and a Civil Engineer. The Data Architect designs a blueprint that the Data Engineer is in charge of implementing maintaining and optimizing. Of course, for smaller companies, these roles are merged into one single position.

Data Architect Skills

Deep knowledge of database models and ETL
Tools that allow the handling of Big Data (Spark, Hadoop)
Knowledge of SQL & NoSQL Databases
Some programming skills on relevant languages like Scala, Java, Python, and Lua
Deep understanding and experience in handling different types of data (in particular structured and unstructured data)

Data Architect Job Description & Roles

Define the strategy, vision, and principles of the data management framework as well as the guidelines for data governance and data stewardship
Analyze and weight trade-offs from different data management solutions
Look for new opportunities to acquire relevant data

Conclusion

In this first part, we guided you through the Data Job Ecosystem. We explored the main causes of the explosion of the data job market, and we explored different specializations that have organically developed due to the increasing necessities to deal with data in more specific ways. Extracting value from data now requires building a team in which different profiles combine different data skills.

Go to the second part of this series where we zoom into the most prominent data job profile: the Data Scientist.

Feeling inspired? Here you can find our available positions.

And if you have any comments or questions, we will be happy to help you.