Blog

Customer Story

Data Quality Is a Team Sport: How 2K Went From 0 to 2,000 DQ Checks

Published Jun 4, 2026

Sid Srivastava

AI Platforms, MLOps & Data Governance Leader at 2K Games

Fabiana Ferraz

Technical Writer at Soda

Table of Contents

We went from detection to prevention. Most data quality tooling will tell you something broke after it broke. Our goal is to catch it before that reaches the dashboards and datasets in the first place.

Sid Srivastava

AI Platforms, MLOps & Data Governance Leader

2K Games

TL;DR

In under a year, 2K Games went from zero automated data quality checks to roughly 2,000 active checks across about 1,000 datasets, and it now sustains a 95% data quality SLA with alerts in minutes. The change wasn't a new tool. It was a new operating model: a named owner on every dataset, data contracts between producers and consumers, and per-studio scorecards.

2K Games is one of the world's leading publishers of interactive entertainment, the studio behind franchises like NBA 2K, WWE, and Borderlands under Take-Two Interactive. At that scale, telemetry runs to billions of records per second and swells to petabyte volumes during a launch. Keeping that data trustworthy across the entire footprint is the challenge.

A year ago, 2K had zero automated data quality checks in production. Today, it runs close to 2,000 across roughly 1,000 datasets, against a 95% data quality SLA. At the Data Innovation Summit 2026, Sid Srivastava, who leads data architecture and ML/AI platforms at 2K, explained how the team got there. The short version: the fix was organizational before it was technical. Here is his account.

The problem: we were reactive, and it showed

Twelve months ago, we had zero automated quality checks in production. We had bolted a few things together, but they produced a lot of false positives and false negatives.

Most issues surfaced the worst way possible: an analyst noticed something odd in a dashboard, or an executive asked about a number in a meeting. Our mean time to resolution was measured in days, not hours.

And every launch came with a war room. Shipping a title meant waiting to see what broke when the pipelines reached peak load for the first time. War rooms were just standard for us.

Why our data is different from a typical enterprise

Our data is different from a typical enterprise in four ways.

First is the scale. We process billions of records per second in telemetry, the kind of volume that breaks standard tooling. NBA 2K alone can reach around a million concurrent players at peak, and every one of them generates events we have to land, validate, and trust in near real time.

Second is the multi-studio reality. Each studio runs its own schemas and platforms, so there is no single source of truth to validate against.

Third is launch day. When a title goes live, our pipelines take peak load for the first time ever. There is no catching it next sprint. It is all the data, all at once.

Fourth is live ops. Patches and season updates constantly reshape the data, so our schemas never really stabilize. They keep evolving.

Hand-drawn diagram of four reasons gaming data differs from enterprise data: scale, multi-studio, launch day, live ops

What changed: we made data quality a team sport

The biggest unlock was a mindset shift.

Data quality is not each team's job. Across the board, it's the whole org's job.

We run six teams in one ecosystem: telemetry, data engineering, ML ops, data science, BI and reporting, and data governance and quality. Telemetry has to send clean events. Data engineering has to land them faithfully. Governance defines the rules.

Once every team owned its part, quality stopped being something a central team chased after the fact.

We made that real with three things.

First, shared ownership with no anonymous queues. Every dataset has a named owner, and that name shows up on every alert. Nothing disappears into a nameless backlog.

Second, data contracts: formal agreements between the teams that produce data and the teams that consume it. With lineage, we can see exactly who is affected when something changes, so a schema change is no longer a surprise.

Third, per-studio scorecards. We measure data quality per studio, because what gets measured is what gets fixed. Those scorecards are still rolling out, and the goal is to turn trust into a number a studio head can own.

How we scaled to nearly 2,000 DQ checks

Picking a tool was never the hard part. The hard part was understanding what every team around us does, then making quality everyone's job. Once the ownership was in place, the tooling could scale it.

We put data quality upstream in the architecture instead of bolting it on at the end. Our managed layer follows a medallion pattern: raw telemetry first, then a curated managed layer with notebooks, self-serve data marts, and near real-time feeds, then the business dashboards on top.

Hand-drawn medallion data architecture from raw telemetry to managed layer to business dashboards, with quality built in upstream

Because quality sits upstream, we catch issues before they reach anyone who depends on them.

Automation did the scaling. Schema breaks are caught in CI/CD, inside GitHub, before a merge lands. Analysts set up their own checks on custom datasets through self-service, so they never have to queue behind us.

Three use cases run on that foundation:

launch quality, where we are the first line of defense when a title ships;
CI/CD integration, with checks running in our pipelines on both Snowflake and Databricks;
and self-serve monitoring.

The checks cover freshness, volume, schema, and row counts, with alerts in Slack and email.

Soda is the engine underneath all three. It runs the same checks across Snowflake and Databricks, gates schema changes in CI/CD, and continuously monitors freshness and volume on the managed layer.

To be clear about scope, this catches structural failures, not a confusing revenue drop or broken dashboard logic. But it gave us the shift that mattered most: from detection to prevention.

The results

At the end of the day, we have the same teams, same studios, but a different operating model. Today, we run close to 2,000 active checks across about 1,000 datasets, against a 95% data quality SLA, and we are hitting it. Schema breaks are caught in CI/CD before they merge. Alerts land in Slack in minutes, not days.

From reactive to proactive: a 12-month transformation.

Then (~12 months ago)	Now
0 automated data quality checks in production	1,945 active checks running across the platform
Issues surfaced by analysts in dashboards, or worse, by execs in meetings	984 datasets under continuous monitoring
MTTR for data incidents measured in days	95% data quality SLA, and hitting it
No formal SLAs on freshness, accuracy, or completeness	Slack alerts in minutes, not days
Launch-day "war rooms" were standard operating procedure	Schema breaks caught in CI/CD before merge
Studios couldn't confidently answer, "is this data trustworthy?"	Per-studio DQ scorecards rolling out

The result that compounds is cultural. Quality is visible and owned, and with per-studio scorecards rolling out, trust is becoming a number a studio head can point to instead of argue about.

What 2K's story means for your team

Buying a tool first is backwards; a tool amplifies an operating model, it doesn't create one. 2K's results came from a cultural shift:

Give every critical dataset a named owner, and put that name on every alert.
Turn producer-consumer dependencies into data contracts before automating.
Move checks left, gating schema changes in CI/CD before they merge.
Commit to an SLA and publish a per-team scorecard.
Turn on self-service so domain teams extend coverage without a central bottleneck.

Soda is built to make each step automatic at scale: data contracts your engineers write in code, observability on your managed layer, and anomaly detection tuned for around 70% fewer false positives.

Book a demo to see how Soda turns a data quality operating model into automated checks.

Frequently asked questions

Who is responsible for data quality?

Everyone who touches the data, with a named owner per dataset. The key practice is removing anonymous queues, so every dataset has an owner whose name appears in every alert.

How do you scale data quality across many teams without a central bottleneck?

Through data contracts and self-service. Contracts make each producer-consumer dependency explicit, so changes can't silently break downstream teams, and self-service checks let analysts add their own coverage without waiting on a central team.

What is a data quality SLA, and is 95% realistic?

A data quality SLA is a committed target for how reliably your data meets its quality checks. 2K committed to and sustained a 95% SLA across roughly 1,000 datasets, so it's achievable at real scale, but only once ownership and automated checks are in place to make the number meaningful. The honest follow-up every leader should ask is: 95% of what? An SLA is only as meaningful as its denominator, whether that's checks passing, datasets meeting threshold, or records within tolerance over a defined window. Pick the unit that maps to how the business actually consumes the data, and the number becomes a commitment you can defend.

How do data contracts prevent bad data instead of just detecting it?

A data contract defines the schema, freshness, and volume that consumers expect, then enforces it in CI/CD. When a change would violate the contract, the pipeline halts before the merge lands, so the bad change never reaches production data. That's the shift from detection to prevention.

Product

Solutions

Pricing

Templates

Blog

Book a demo

Product

Solutions

Templates

Pricing

Blog

Book a demo

Case studies

Trusted by the world’s leading enterprises

Real stories from companies using Soda to keep their data reliable, accurate, and ready for action.

At the end of the day, we don’t want to be in there managing the checks, updating the checks, adding the checks. We just want to go and observe what’s happening, and that’s what Soda is enabling right now.

Sid Srivastava

Director of Data Governance, Quality and MLOps

Read the story

Investing in data quality is key for cross-functional teams to make accurate, complete decisions with fewer risks and greater returns, using initiatives such as product thinking, data governance, and self-service platforms.

Mario Konschake

Director of Product-Data Platform

Read the story

Soda has integrated seamlessly into our technology stack and given us the confidence to find, analyze, implement, and resolve data issues through a simple self-serve capability.

Sutaraj Dutta

Data Engineering Manager

Read the story

Our goal was to deliver high-quality datasets in near real-time, ensuring dashboards reflect live data as it flows in. But beyond solving technical challenges, we wanted to spark a cultural shift - empowering the entire organization to make decisions grounded in accurate, timely data.

Gu Xie

Head of Data Engineering

Read the story

4.4 of 5

Your data has problems.
Now they fix themselves.

Automated data quality, remediation, and management.

One platform, agents that do the work, you approve.

Book a demo

Trusted by

Terms & Conditions

Case studies

Trusted by the world’s leading enterprises

Real stories from companies using Soda to keep their data reliable, accurate, and ready for action.

Sid Srivastava

Director of Data Governance, Quality and MLOps

Read the story

Mario Konschake

Director of Product-Data Platform

Read the story

Soda has integrated seamlessly into our technology stack and given us the confidence to find, analyze, implement, and resolve data issues through a simple self-serve capability.

Sutaraj Dutta

Data Engineering Manager

Read the story

Gu Xie

Head of Data Engineering

Read the story

4.4 of 5

Your data has problems.
Now they fix themselves.

Automated data quality, remediation, and management.

One platform, agents that do the work, you approve.

Book a demo

Trusted by

Terms & Conditions

Case studies

Trusted by the world’s leading enterprises

Real stories from companies using Soda to keep their data reliable, accurate, and ready for action.

Sid Srivastava

Director of Data Governance, Quality and MLOps

Read the story

Mario Konschake

Director of Product-Data Platform

Read the story

Soda has integrated seamlessly into our technology stack and given us the confidence to find, analyze, implement, and resolve data issues through a simple self-serve capability.

Sutaraj Dutta

Data Engineering Manager

Read the story

Gu Xie

Head of Data Engineering

Read the story

4.4 of 5

Your data has problems.
Now they fix themselves.

Automated data quality, remediation, and management.

One platform, agents that do the work, you approve.

Book a demo

Trusted by

Data Quality Is a Team Sport: How 2K Went From 0 to 2,000 DQ Checks

Data Quality Is a Team Sport: How 2K Went From 0 to 2,000 DQ Checks

Data Quality Is a Team Sport: How 2K Went From 0 to 2,000 DQ Checks

Who is responsible for data quality?

How do you scale data quality across many teams without a central bottleneck?

What is a data quality SLA, and is 95% realistic?

How do data contracts prevent bad data instead of just detecting it?

Trusted by the world’s leading enterprises

Your data has problems.Now they fix themselves.

Trusted by the world’s leading enterprises

Your data has problems.Now they fix themselves.

Trusted by the world’s leading enterprises

Your data has problems.Now they fix themselves.

Your data has problems.
Now they fix themselves.

Your data has problems.
Now they fix themselves.

Your data has problems.
Now they fix themselves.