Organizations are making a fundamental shift in how they think about data, as the modern data value chain evolves. And at the same time, they need to make a shift in how they build teams around data.
In partnership with Soda, a Belgian-born software company who is establishing itself as the data observability platform to test, monitor and manage data quality, we’ve embarked on a series to explore the ‘Data Dream Team’.
We think the topic of the data team is really important. Not only because we believe that data is a team sport, but because with the seismic shift to making data front and centre in so many organizations, there’s a new approach needed to align how the organization, the team, the people, are structured and organized around data. New roles, shifted accountability, breaking silos and forging new channels of collaboration. We’ll be breaking it down, and building it up, to understand what “good” looks like.
To get us started on the Data Dream Team series, we’ve chosen football as our analogy, for a few good reasons (many of which I am sure you will not guess). The first reason is that Belgium is currently ranked by FIFA (Federation of International Football Associations) as the number one football team in the world. The second reason is that 2021 was the summer of football in Europe, with the UEFA Euro 2020 taking place in June and July, and whilst unfortunately Belgium did not take the championship (congratulations Italy), the importance of football on this continent could not be missed. The third reason was inspiration from Michael Cox’s book Zonal Marking, in which he explores the making of modern European football defined by team formations, game strategies, and culture.
Analogy complete and link to Belgium justified.
The three teams for data
In my book, Data Teams, I talk often about the importance of getting the balance of a team right, between people and skills, experience, and education - and then getting everyone to collaborate. From my research into successful and failing teams, I have come to the conclusion that to manage data as a product, you need three different overarching teams, with each team doing something very specific. Each team represents different skills, strengths, and approaches that are needed to create value from data.
Let me introduce you to each team briefly. The data engineering team creates the data products and makes sure they’re delivered through reliable data pipelines. The data science team uses these data products and combines them with advanced analytics to create new business value. The operations team keeps everything running smoothly so that data consumers are not relying on something that isn’t dependable.
So where do you begin and where do you invest first? It’s important to first of all understand what skills and strengths you currently have in your team, and identify the gaps in experience.
So let’s dive in.
The Operations Team
In football, this is the defence. The world’s top ranked team (Belgium) is packed with attacking creativity, and they have formed one of the most trustworthy and dependable protective backlines in world football for the best part of a decade.
The operations team is responsible for putting off-the-shelf software and custom in-house build software into production and making sure it runs smoothly. The skills needed on an operations team are wide-ranging: hardware, software/operating systems, distributed systems, troubleshooting, security, data structures and formats, scripting/programming, operationalization best practices, monitoring and instrumentation, and disaster recovery.
The operations team has to be familiar with the data being used. This data includes information like the expected amount or size of the data itself, the type of data being sent, and the correct data format. Such information helps the team plan resource allocation.
The Data Engineering Team
On the football field this is the midfield. For Belgium, this is the strongest area of the team because they can call on an array of supporting talent, all of whom understand the role they need to play to support the team. It sets them apart - the balance of skills used to nullify an opponent or build an attack is unrivalled.
The data engineering team gets the data into the right place and ensures it is usable. A data engineering team is responsible for creating data products and the architecture to build data products, data pipelines. The data engineering team must choose the right technologies for the data and the use cases to accomplish this task.
Because writing and testing that code is a part of the data engineering team’s job, a software engineering background is highly recommended. The data engineers are tasked with writing the code that executes the use case on the big data framework, so they need to be skilled programmers. The data engineers are also responsible for continuous integration, unit tests, and engineering processes. These needs are often misunderstood and left missing from teams. Data engineers also have an interest in data itself. Some would go so far as to say they have a love of data—and I would agree.
The Data Science Team
On the football field these are the forwards. Belgian’s forwards operate at the sharp end, using the solid foundations built by the rest of the team to do the most important thing in football: score goals.
The goal of the data science team is to create advanced analytics. On the high end of the spectrum, these could be generated by artificial intelligence (AI), where machine learning (ML) is currently the most common technique. On the low end, these analytics are built from advanced statistics and mathematics. Other data team members may know statistics and mathematics too, but not at the sophisticated level required for data science.
Thus, a data scientist combines an advanced mathematics and statistics background with programming, domain knowledge, and communication skills to analyze data, create applied mathematical models, and present results in a form useful to the organization.
The data science team shouldn’t be asked to create the data infrastructure or software architectures they use to create the data products, do discovery, train the models, or deploy these models. These infrastructure tasks are something more in the domain of the data engineering team. Furthermore, data scientists are rare assets who have plenty to do besides creating infrastructure. Thus, the data science team should depend on the data engineering team’s data infrastructure and software architecture.
The Manager
In football, the manager isn’t on the field. You can see them pacing on the sidelines hoping their (data first) strategy and training is going to pay off. They’re responsible for the big picture decisions. It’s their job to make sure the right people are in the right positions and no skill is missing.
In data teams, this executive manager varies by company. At some companies, there are two or three different executives. For example, data science could be under a Chief Financial Officer and data engineering/operations is under the Chief Technical Officer. With two different executives, come different goals that can translate into challenges for the teams to align. With one executive, such as a Chief Analytics Officer or Chief Data Officer, there can be much better alignment around objectives and responsibilities. It’s my opinion and experience that data teams flourish under a single executive, just like there is only one manager on the football team.
Conclusion
Now that we’ve seen the various teams and their football counterparts, we can understand a data team better. We indeed see that no single part of the team is more important than the others. What would happen if one of our parts was missing?
From my research, I’ve found that many organizations are missing at least one of the three teams. Depending on which team they’re missing, they’re either having all kinds of goals scored on them (no operations), can’t move the ball to a place to score (no data engineering), or can’t score a goal (no data science). Perhaps the most common team issue I’ve seen has been to hire a data team solely composed of data scientists and wonder why they can’t score any goals.
An effective team has to cover just about all the skills in the computer field, including a few that are relatively rare. So not only is it essential to assign clear roles to teams but some differentiation within each team is also needed to get the range of expertise you need. That said, there is some overlap between the roles of the different teams. Hence the importance of soft skills, especially communications, to round out the jobs. It’s only after extensive teamwork that we can score the goal!
As part of the Data Dream Team, there are a series of podcasts that delve into growing and nurturing your team and their skills - with guests including Paco Nathan, Jordan Morrow, Zhamak Dehgahni, Holden Karau, and Caroline Carruthers and Peter Jackson. I encourage you to join the conversation and hear how data leaders and practitioners are putting their data dream teams into play.