what is a data engineer?
More than anytime in history, data around the world is being generated at a mind-numbing pace. The vast volume of business, personal and other data is expected to reach 175 zettabytes (175,000,000,000,000,000,000,000 bytes) by 2025, up from 45ZB in 2019. The creation of so much data is driven by the growth in communication, the acceleration of digital business as a result of the pandemic and the development of connected devices.
With so many 1s and 0s stored on servers and storage devices everywhere, how can society and markets best make use of all this information? And who can help make sense of all this raw knowledge?
As organisations everywhere engage in more digital workflows, there is a growing demand for professionals who help to ensure information is gathered and processed, cleansed and formatted, and prepared for data scientists to use. In fact, both data scientists and engineers are essential members of the same team needed to convert those 1s and 0s into useful information that business leaders need for intelligent decision making.
Demand for these highly skilled workers will continue to remain high as businesses accelerate digitalisation during the pandemic crisis. A search conducted at the beginning of Q4 2020 revealed more than 150,000 data engineering positions on LinkedIn.
data engineer jobs
average salary of a data engineer
According to LinkedIn's 2020 Emerging Jobs Report, data engineering has an annual growth of 33% and is ranked 8th among the top 15 jobs in the U.S. The average salary for a data engineer in New Zealand with a few years of experience is $120,000.
data engineer salary
Data engineer salaries range from $120,000-$200,000 across New Zealand. Due to high demand in the market and the scarcity of skills, salaries have increased significantly in a short amount of time. Commercial experience is the most important factor influencing a data engineer's salary.
what is a data engineer?
According to CIO, data engineers collect, manage and administer data. They are a critical part of any data operation by creating the architecture for acquiring and processing raw data and then preparing it for data scientists to analyse this information and create insights from it. Data engineers identify trends in data sets and develop algorithms as part of the prep work. Like many IT roles, data engineers possess very deep and specific technical skills, such as SQL database design, multiple programming languages and cloud services.
In addition to the need for technical skills, they are part of a team that must deliver critical insights needed by business leaders to guide their day-to-day and long-term strategic goals. By enabling these executives to quickly understand and react to immediate and emerging trends, analytics teams play an important role in facilitating outcomes for their organisations.
From one day to the next, data engineers work with business and IT colleagues to develop architecture and create interfaces (APIs) that improve the usability of data. Whether they are preparing the information for use in a dashboard, to be imported into a database or extracted for other purposes, the engineer is responsible for ensuring the integrity of the data and pipelines. Other regular tasks include combining different data sets, determining how to store the information and working with data scientists and analysts to acquire the needed insights.
data engineers typically fall into one of three types:
- generalists (oversees all data tasks within an organisation including analytics)
- pipeline-centric (manages all the data flow into the company) and
- database-centric (works with multiple databases).
The size of the organisation often dictates the type of data engineer employed since smaller ones may be limited to a small team or even just one individual managing the data. Companies with more resources may be able to deploy more engineers to support a higher volume and broader analytical needs.
working as a data engineer
duties & responsibilities
As a data engineer, you will require the following skills:
- Communication skills (data). You know about the need to translate technical concepts into non-technical language and understand what communication is required for internal and external stakeholders. (Relevant skill level: awareness)
- Data analysis and synthesis. You know how to undertake data profiling and source system analysis and can present clear insights to colleagues to support the end use of the data. (Relevant skill level: working)
- Data development process. You can design, build and test data products based on feeds from multiple systems using a range of different storage technologies and/or access methods. You know how to create repeatable and reusable products. (Relevant skill level: working)
- Data innovation. You know about opportunities for innovation with new tools and the use of data. (Relevant skill level: awareness)
- Data integration design. You can deliver data solutions in accordance with agreed organisational standards that ensure services are resilient, scalable and future-proof. (Relevant skill level: working)
- Data modelling. You understand the concepts and principles of data modelling and can produce, maintain and update relevant data models for specific business needs. You know how to reverse-engineer data models from a live system. (Relevant skill level: working)
- Metadata management. You can work with metadata repositories to complete complex tasks such as data and systems integration impact analysis. You know how to maintain a repository to ensure information remains accurate and up to date. (Relevant skill level: working)
- Problem resolution (data). You know about the types of problems in databases, data processes, data products and services. (Relevant skill level: awareness)
- Programming and build (data engineering). You can design, code, test, correct and document simple programs or scripts under the direction of others. (Relevant skill level: working)
- Technical understanding (data engineering). You understand core technical concepts related to the role and can apply them with guidance. (Relevant skill level: working)
- Testing. You can execute test scripts under supervision. You understand the role of testing and how it works. (Relevant skill level: awareness)
The daily tasks involved in achieving these goals are varied. These include:
- Extracting data and preparing it as part of the ETL (extract, transform, and load) processes
- Converging data sets
- Evaluating, parsing, and cleaning data sets
- Coding and executing
- Creating data stores and utilising these for analysis
- Using frameworks to serve data
It is the data engineer’s main responsibility to ensure the information made available to scientists and other stakeholders is true and usable. This also requires close collaboration with other team members including application developers, data scientists and database administrators.
With so many companies generating massive amounts of data and accelerating their digital operations, the need for business insights has never been greater. This is putting tremendous pressure on data teams to rapidly collect, extract and process information more expeditiously.
For data engineers, this can mean long days behind the desk as they face more projects. For generalists that work at small and mid-sized companies, they may be asked to work long hours to meet growing demands. The hours are dictated by a number of factors, including company culture, type of business, staff size and growth trajectory.
Increasingly, companies are deploying data engineers on a contingent or contract basis to meet their growing data needs. This allows some workers to take on various projects and gain valuable experience in different technologies to meet a variety of business needs. These arrangements also allow non-permanently hired data engineers to move from one client to another to gain more exposure to new challenges and opportunities.
While they do work within a team, data engineers can perform their jobs on-site or remotely. The tools and datasets utilised for the job are all digital so there are no limitations to where they physically sit as long as they have secure access to their servers. Only company culture and policies dictate whether the work is performed on-site or virtually, but considering the current broad adoption of working from home, many data engineers are likely to continue to perform their duties remotely.
advantages of working with randstad as a data engineer
As the largest HR services business in the world, Randstad works with some of the most experienced and talented data engineers and leading companies that employ them. As a talent provider to small, medium and large companies across New Zealand, our candidates have access to the most admired businesses in their field, including leading IT&C companies, life sciences, financial services, manufacturing and others.
Randstad’s experienced recruitment teams around the world leverage the latest talent technologies to create strong matches of candidates to available job openings. Our recruiters also spend significant one-on-one time with job seekers to understand their professional desires and connect them with the right employers.
“Data engineering is not flashy and user-facing like other engineering positions are, but there is still a lot of creativity involved in integrating complex data sets to deliver a solution. It's also gratifying to come up with architecture that advances other people's data needs and see your own product empower insights and data-driven decisions."
education & skills
education & qualifications
To pursue a career in data engineering, key skills involve programming, mathematics, software development, data mining, database management, IT and cybersecurity. Having a strong technical background is required of all types of data engineers, whether the role is a generalist, a pipeline-centric engineer or a database-centric expert. Most organisations hiring data engineers look for candidates with the following degrees:
A bachelor’s, masters or PhD in:
- information technology
- computer science
- software engineering
In addition to university education, employers may look for certification in one of several key technology areas. According to CIO, the following are the most sought-after certifications for data engineers and architects.
- Amazon Web Services (AWS) Certified Data Analytics – Specialty
- Cloudera Certified Associate (CCA) Spark and Hadoop Developer
- Cloudera Certified Professional (CCP): Data Engineer
- Data Science Council of America (DASCA) Associate Big Data Engineer
- Data Science Council of America (DASCA) Senior Big Data Engineer
- Google Professional Data Engineer
- IBM Certified Data Architect – Big Data
- IBM Certified Data Engineer – Big Data
- SAS Certified Big Data Professional
skills & competencies
Data engineers need to be well-skilled in data architecture and database design and maintenance. To competently perform their jobs, they are required to have strong knowledge of a variety of technologies and languages – as many as 10 to 30 to choose the best tools for the projects they work on. Many organisations often will deploy a single suite of cloud services from one vendor, so having a deep understanding of one platform is often necessary, whether that’s on AWS or Azure.
Some of the skills a data engineer needs include:
- Apache Spark
- Extract/Transform/Load (ETL)
- Amazon Web Services
- Shell scripting
- Distributed ML Platforms: MLib (Spark)
- Parallel Computing for Deep Learning (Tensorflow, GPU Programming)
- Development in Containers (Docker, Rkt)
- Programming in Notebooks (Zeppelin, Jupyter)
- Java, C++, and/or Go and functional languages (Scala, Clojure, Elixir)
- Azure (building out pipelines)
Beyond technical skills, career advancement also requires many soft skills typically possessed by managers in any function: strong communication, team-oriented collaboration, good project management and efficient use of time. Because data engineers are typically asked to fulfil a business need, they must be able to work with a number of data colleagues and operational leaders to determine the objective of any project or initiative.
Here are the most asked questions about working as a data engineer
Do data engineers perform similar work as data scientists?
Not exactly. Engineers focus on making sure the information that will be used to create business insight is accurate, clean and ready for use by data scientists. These two roles may work closely together to ensure the analytical work results in information that business leaders can understand and use to achieve their business goals.
Will I be able to get a job right away as a data engineer after graduating?
Many employers look for candidates with at least a few years of work experience in the field, but with a shortage of data engineers right now, some are recruiting graduates that have strong programming and technology knowledge and problem-solving skills. The best way to get work as a data engineer is to acquire many of the base skills and build on them through additional certification and working on data projects.
Aren’t data engineers simply a subset of computer coders?
Coding is an essential skill that data engineers must possess, but their work is far more complex than just programming. An understanding of data architecture, databases and distributed systems is required. They must be able to identify issues with data sets, develop solutions to address them and integrate the data into the systems that will be used to analyse the numbers.
Is a master’s degree in data engineering required to advance in this field?
Not all companies require their data engineers and scientists to have a master’s, but to acquire a management-level role, it is strongly recommended. There are many strong data experts who work in the field without a postgraduate degree, leveraging their work experience and technology expertise to get ahead in their field. However, a master’s or PhD offers a greater understanding of theories and problem-solving.
Additionally, certification in various tools and technologies can also help advance a career in this field.