Both data quality management and data governance are critical data management activities. After all, there’s not much point managing your data if you aren’t sure if it is the right data, or if it is of good enough quality to use. But, which should come first, data quality or data governance? Learn what you need to start with and how to overcome the obstacles.
In collaboration with Soda, Nicola Askham, known as The Data Governance Coach, has authored a comprehensive three-part series that addresses critical topics including data governance, AI, and data quality. Nicola specializes in helping organizations enhance their data management practices. Over the past twenty years, she has assisted numerous corporations in reducing costs and inefficiencies through her dedicated coaching, consulting, and training initiatives.
Like the proverbial question of which came first, the chicken or the egg, many people in the data management community ponder, which should first, data quality or data governance?
It's not an easy question to answer. Most of us understand what data quality is – “data that is good enough to use” is a simple enough concept to grasp. Data governance, on the other hand, sounds like something that will prevent you from doing your work and, let’s be honest, it also sounds rather boring! But that’s not true; data governance is about understanding the data your organization has and having a structured framework of roles and responsibilities to manage it. This ensures that the right people can make consistent decisions about the quality and use of your data.
Both data quality management and data governance are critical data management activities. After all, there’s not much point managing your data if you aren’t sure if it is the right data, or if it is of good enough quality to use. Done well, data quality management and governance ensure that the right data of the right quality is available to the people and systems that need it. Add other data management activities to the mix and we can ensure that the data is secure, stored logically on systems, and is made available for analysis and insights.
But back to our original question and why it is not easy to answer. While in theory they are two separate data management disciplines, they are deeply interrelated and you should not be doing one without the other. I often describe them as symbiotic - they support each other and ideally, both activities will be undertaken by the same team. With that in mind, you can see that it is difficult to decide where to start.
In an ideal world, we would get our Data Governance Framework designed and implemented before focussing on data quality. Unfortunately, we do not live in an ideal world, and I have never come across an organization which has done data governance before commencing any data quality activities.
There’s a reason for this. There are always people in parts of your organization who understand when data is not good enough for them to do their job and they take action to fix it. This results in ad hoc data cleansing activities, and maybe some basic data quality reporting but with varying degrees of success. Sadly, without data governance in place these activities tend to be tactical at best and often short lived in the case of data quality reporting.
I would never want to criticize someone who has taken the initiative to improve data quality, but doing so without the foundation of a solid Data Governance Framework leads to duplication of effort and can even make matters worse. Over the years, I have come across countless instances where multiple teams were manually cleansing or fixing the same data set, but not in exactly the same way. This resulted in a variety of different answers to basic questions like, “How many customers do we have?” or, “How many sales did we make last month?” Significant amounts of time then get wasted as different teams are part of heated debates as to why their answer is the right answer!
Whilst data governance supports all the other data management disciplines, one of the primary reasons it is implemented is to support improved data quality. Having data governance in place enables a proactive approach to data quality issue resolution, fixing the source of the issue once. This stops the endless cycle of continuous data cleansing and endless debates about which data is truly correct.
Well, a key part of data governance is a data catalog. A catalog helps us understand and define the data we have. After all, how can we define the data quality rules, if no-one can agree on what that data is in the first place? Everyone thinks that they know what the data is, but it isn’t until you start documenting data definitions that you realize that people have different views. These different views can cause a host of issues and efficiencies and need to be exposed and resolved.
Having data governance in place first ensures that the right people are making decisions about the data quality rules. This is especially important for data used by multiple people across an organization. Without a data owner, many teams make conflicting decisions about what is acceptable for the quality of a data product.
It's easy to say we should do data governance first, but it’s not always that simple.
If you work in a distributed organization, it may not be easy to agree on one data owner for widely-used data. It may seem easier to let everyone do their own thing, but it does not lead to positive, long-term results. It’s important to design a structure of Data Owners and Data Stewards that can work across organizational silos and encourage communication, so that any decisions about data quality account for everyone’s requirements.
Unfortunately, not many organizations have the luxury of being able to implement data governance before they commence data quality activities. The one exception is start-ups. I've spoken with people in several start-ups over the years and whilst they are starting from a clean slate, they are usually constrained by the need to be as cost-efficient as possible. This means that they are not always open to defining data before they start using it.
You need to tie the implementation of data governance and data quality activities to overall company objectives. Explain that if you implement solid data management practices now, you can avoid future issues and, more importantly, increase the organization’s chances of success.
Remember, data governance may sound like a compliance activity, but in reality, when done properly, it delivers real business value. If you are working at a start-up: get it done early; if you are not, get it implemented as soon as possible. Implementation benefits include enabling easier automation, reducing inefficiencies and costs, stream-lining customer experiences, and improving risk management. Further, we must not forget that data governance and good data quality are essential for the successful adoption of AI, but more on that another time.
Whatever the size and age of your organization, and whether or not you have the budget for tools to support your efforts, you can, and should, implement data governance. With all the benefits to be had, it would be foolish not to!
Here are some examples of the pitfalls you should avoid during implementation:
If your organization has a Data Governance Framework and you have already begun monitoring and testing data quality, there are different pitfalls to avoid:
It can be difficult to convince your organization of the value of adopting both data governance and data quality practices. Adrian Smith, Head of Data at Clearspring, has experienced this challenge and shares his experience and advice.
“In every organization I've encountered, there seems to be a universal truth about data quality: it's often overlooked until a problem arises. Whether it's discrepancies in reports, processes failing, or data output resembling more gibberish than valuable information, the issue remains the same. Initially, in start-ups or scale-ups, the data pool is small enough that anomalies and outliers can be manually corrected. However, as the organization expands, the volume of data balloons, and the complexities of managing it multiply at a pace that outstrips both the implementation of Data Governance Frameworks and the expansion of data management resources.
This growth phase often sees resources stretched thin, as the focus is squarely on supporting the burgeoning business—why worry about tomorrow when today's challenges are pressing enough? Yet, this approach sows the seeds for future problems. Larger, more mature organizations have learned this lesson the hard way and have responded by establishing roles like data champions or data stewards, who assume ownership and responsibility for data, treating it as the critical asset it is. In contrast, in many start-ups, data is initially seen as a by-product of operational processes, not an asset in its own right. Ownership falls by default to the IT department, who are expected to 'fix' data quality issues with temporary solutions that don't address the underlying problems, leading to a cycle of recurring issues.
Here lies a common pitfall: neglecting the establishment of a robust Data Governance Framework until data quality issues become unmanageable. The lack of clear data ownership and a holistic, business-wide strategy for data management means that when problems arise, solutions are often reactive rather than proactive, short-term rather than strategic. Another pitfall is viewing data as an IT issue rather than an organizational asset, leading to decisions that fail to leverage data's full potential to drive business growth and success.
As data managers, we embody the spirit of the Roman God Janus, looking in two directions at once. We must manage the current data landscape, addressing immediate data quality issues, while also laying the groundwork for robust data governance that will prevent such issues in the future. This dual focus is critical in environments where resources are limited but ambitions are high. We are tasked with maximizing what can be done today while planning for a future where data governance not only supports but enhances business operations.
Adopting a proactive stance on data governance from the outset can transform data from a potential liability into a significant asset. It enables scalability, supports informed decision-making, and fosters an organizational culture that values data quality as a cornerstone of success. Thus, while it may be tempting to postpone data governance initiatives in favour of more immediate concerns, the most significant challenge—and opportunity—lies in recognizing that investing in data governance is not just about avoiding future problems; it's about enabling future successes. By embracing this challenge, we ensure that our organizations can grow and innovate without being held back by the burdens of technical debt and missed opportunities.”
You can listen to Adrian and I discuss this topic in detail on our podcast here.
The dilemma of prioritizing data quality or data governance will always cause friction. While it is preferable to establish data governance before addressing data quality, in practice, the opposite usually happens.
If you don’t have any data management practices in place yet, I would encourage you to think strategically and get started with data governance first to provide a strong foundation for your data quality activities.
If you are already testing for data quality but have not initiated any data governance, take a hard look at how data governance can add value and sustainability to your existing DQ activities.
And if you are lucky enough to have data governance in place already, make sure that your data quality activities are aligned with your Data Governance Framework and that you avoid the pitfalls listed above.
If you’re ready to dive in, access a checklist of steps to ensure alignment between your data governance and data quality activities.
Good luck!