Publié
29 mars 2026
Shifting Data Quality Left with Data Contracts - Case Study with Make




Make is an AI automation platform that lets teams build and automate workflows across 3,000+ apps — from CRMs and databases to accounting systems and custom APIs. Trusted by over 400,000 customers, Make is used across marketing, finance, operations, and IT to connect tools and automate processes.
Internally, Make's data engineering team consolidates data from Salesforce, operational databases, and accounting systems into Snowflake, where it feeds the financial reports and dashboards that leadership depends on.
As with most fast-growing companies, data governance was initially deprioritized in favor of shipping. Over time, this created a familiar set of challenges: source systems changed without warning, ownership of datasets became unclear as teams evolved, and quality issues traveled all the way downstream before anyone caught them, surfacing as a discrepancy in a report rather than an alert in an engineering system.
By adopting Soda v4 and embedding data contracts at both the ingestion and transformation layers of their Airflow pipelines, Make shifted data testing upstream. Contracts now define what good data looks like before it enters the pipeline — catching schema breaks, null violations, duplicate keys, and business logic errors at the point of origin.
The result is dramatically fewer reactive interruptions, automated ARR reconciliation, and, for the first time, a shared understanding across teams of who owns what data and what downstream consumers depend on.
The challenge: data issues that only surface in reports
Three compounding problems made data quality a constant source of friction at Make.
Lack of visibility. Source system owners had no visibility into downstream dependencies. Salesforce admins would delete columns or migrate schemas without knowing that those fields were feeding dashboards. By the time the impact was visible, it was already in a KPI report.
Ownership was unclear. As teams evolved, institutional knowledge about dataset ownership disappeared. When a sync broke, the first two hours weren't spent fixing code — they were spent finding out who currently owned the data.
Fixes treated symptoms, not causes. When something broke, the response was a SQL patch in the transformation layer. The root cause at the source went unaddressed, so the same problems recurred.
"Before, we would be notified usually by our users — and that's a really reactive way to work on data quality." — Renata Hlavová, Data Engineer at Make
The most sensitive area was the monthly ARR reconciliation. Finance and Analytics consumed the same source data but regularly arrived at different numbers. Without automated checks embedded in the pipeline, errors traveled all the way downstream. Every fix started with a stakeholder notification, not a monitoring alert — pulling engineers off planned projects to investigate issues that had already made it into reports.
The solution: data contracts at every layer
Make's approach was to shift testing upstream: define expectations at the source, enforce them automatically, and stop bad data before it can propagate.
The team rebuilt their Soda implementation from the ground up when they adopted Soda v4 — becoming one of its earliest adopters. Data contracts formalized rules about what data should look like and became the mechanism for this shift-left strategy.
The architecture: Airflow + Soda, programmatically
Everything at Make runs through Airflow. So the team built a custom Airflow operator that triggers Soda contract evaluations in sync with each data model run. This means quality checks execute automatically when data moves through the pipeline.
Soda is embedded in two distinct layers, each with a different testing focus.

Layer 1: Ingestion requirements
Working with Finance and the Business Processes & Automation (BPnS) team — who manage Salesforce operations — Make's data engineers established explicit expectations. When data arrives from external systems — Salesforce, accounting platforms, operational databases — a data contract validates the dataset before anything else runs.
These contracts enforce structural expectations:
Required columns must exist (catching schema changes before they break downstream models)
Primary keys can't contain duplicates
Critical fields can't be null
Values must fall within acceptable ranges or match a defined set
This is the shift-left principle in practice: instead of discovering a missing column when a transformation fails three steps later, Soda catches it at the point of ingestion.
Here's an excerpt from Make's actual Salesforce Account ingestion contract. Notice how every check is attributed to the BPnS team — the team that manages Salesforce operations — with Slack notifications routed to #dq-bpns. The contract is a shared agreement between data engineering and the source system owners.
Make applies the same pattern to their Salesforce Opportunity table, but adds a business logic check at the source. See the excerpt below:
This check ensures that every opportunity marked as "Closed Won" in Salesforce also has the correct IS_WON and IS_CLOSED flags. If the flags don't match the stage, something changed upstream that shouldn't have — and Soda catches it before the inconsistency reaches downstream models.
This is shift-left testing beyond schema: enforcing business rules at the point of origin.
Layer 2: Pre-distribution quality gates (testing business logic)
As data moves through modeling, the testing focus shifts from structural to semantic. The contracts here catch logical contradictions that pass schema checks but would make Finance question the numbers.
Make's serving layer includes cross-table referential integrity checks — ensuring that no organization ID appears in downstream tables without a corresponding entry in the master table. The contract also has column-level checks to enforce completeness, validate categorical values against allowed lists, and apply conditional rules to specific record subsets — each tagged with a severity that determines whether a failure warns or blocks the pipeline.
See the excerpt below:
The hard stop: a circuit breaker for critical data
For the ARR reconciliation — Make's highest-priority data flow — the team took the shift-left approach one step further: a pipeline hard stop.
If the row count drops by more than 100 compared to the previous snapshot — a sign of an incomplete sync or data loss — the pipeline stops. No transformation runs. No dashboard updates. The Finance team and the data engineering team are alerted simultaneously, and the conversation starts upstream.
Here's the actual data contract check that powers the hard stop. It compares today's row count against yesterday's (with weekend logic to handle non-business days) and alerts both #dq-finance and #data-alerts with critical severity:
Before this circuit breaker, incomplete syncs caused unexplained fluctuations in financial reports. Leadership would see sudden drops or spikes with no clear cause. Now, the pipeline catches the problem first, stops, alerts the relevant teams, and ensures the issue is resolved upstream.
The impact: from reactive patches to proactive testing
The shift from reactive fixing to proactive testing transformed both how Make's team works and how the broader organization relates to data.
Manual reconciliation became automated.
The monthly ARR reconciliation — once a cross-team, multi-day effort — now runs with automated contract checks and alerts. Incomplete or inconsistent data is caught and stopped before it reaches reports.
Reactive interruptions dropped significantly.
Engineers spend more time on planned projects and less time investigating issues that have already made it into dashboards.
Data ownership became explicit.
For the first time, there is a clear map of how data flows from sources to downstream consumers. The process of defining contracts — starting with "who owns this dataset?" — created the transparency that was missing.
Non-technical teams gained visibility.
While the checks run programmatically, Soda Cloud UI plays a key role for non-technical stakeholders. Data owners can log in and immediately see the health of their datasets: what's failing, what's at risk, and how critical each issue is. Slack integration routes alerts by severity, so teams only get notified about what requires immediate attention.

In the end, the most significant outcome of adopting data contracts wasn't technical, but organizational. Defining contracts forced conversations that had never happened before.
"Now the teams are starting proactively asking questions like: 'We are making this change — is this going to affect you?' I think we are becoming more mature here; it's not chaos with data scattered everywhere anymore." — Renata Hlavová, Data Engineer at Make
What's next: deeper contracts and self-service diagnostics
Make's shift-left journey is still expanding. The team is extending data contracts to more source systems and deeper into the transformation layers.
Metrics monitoring is being refined for anomaly detection, such as detecting when a category value suddenly disappears from a column, signaling a silent upstream change.

The next major milestone is a diagnostic warehouse in Sigma: failed records collected by Soda and sent to Snowflake, surfaced in a dashboard, filtered by team and severity. The vision is full self-service — any team will be able to inspect their own data quality issues without filing a ticket or waiting for a data engineer.
The longer-term goal isn't just better contracts. It's a company-wide shift toward data literacy, where every team that produces or consumes data understands how their changes affect the rest of the organization.
Get in touch
Schedule a demo with the Soda team or request a free account to see how data contracts can help your team shift data testing left, catching issues at the source before they reach your dashboards.








