Data management tools solved automated detection. They never solved automated remediation. And detection without remediation is just debt on a longer timeline.
We've helped teams detect data issues automatically at scale for years. It's high time we helped fix them, too.
Today, we're introducing Soda Cleanse: an agentic data cleansing capability that extends Soda's detection capabilities into automated remediation.
Specialized AI agents analyze the failures Soda finds, generate targeted fix proposals, and route them to data stewards for approval. Nothing changes in your data without a human sign-off.
The steward governs. The agent does the janitorial work.
Soda Cleanse is an add-on to Soda Cloud and is available today in Private Preview.
How Soda Cleanse Works
Soda Cleanse is contract-driven, agent-specialized, and human-approved. Those three properties distinguish it from generic AI data cleaning approaches, and from the ad-hoc scripts most teams fall back on today.
The Inbox is Soda Cleanse's center of gravity. It’s where data stewards and record owners support triage, decide, and move on. It's not a dashboard and not a chatbot. It’s a workflow application with an audit trail.

↗The Issues Inbox. Each row shows the record, affected column, suggested fix, and assigned steward. Expanding a row reveals the AI Reasoning panel (a plain-English explanation with a confidence score) alongside the Row Context fields the agent used to determine the fix.
Cleanse is built on top of Soda Cloud and the Diagnostics Warehouse (DWH). Every failed row from every failing check in a Soda deployment pools in the DWH automatically, with a consistent schema and full scan history. Cleanse plugs into that pool directly. No custom ingestion pipeline, no per-source wiring.
Three things happen on a loop:
1. Ingest
Every failed row from every failing check lands in the Inbox automatically, with its check context, scan history, and prior decisions attached. One row, one Issue — no matter how many checks fired against it.
2. Propose
The data contract declares how each failure should be fixed, picking the right tool for the problem: a safe default, a lookup, an AI-assisted suggestion, or a human call. AI is the last resort, not the first.
3. Apply
Approved fixes reach source through an audited writer — or stay in a staging table if your team isn't ready to grant write-access yet. Nothing changes in your data without a steward signing off, and every decision lands in the audit trail.

Soda Cleanse closes the loop on data quality. Contracts define correct. Agents fix what isn't. Stewards govern the outcome.
💡 For a deeper look at the evolution from manual scripts to agentic cleansing, read our Guide to Modern Data Cleansing.
What contract-driven cleansing gets you
The remediation strategies Cleanse runs live inside the same data contract the Soda customer is already maintaining. A customer who adds a new check gets the remediation slot for free: they fill it in when they're ready, and Cleanse picks it up automatically.
Four outcomes fall out of that:
One artifact, one workflow. Detection and remediation live inside the same data contract, so they can't drift apart. Every fix traces back to the rule that validates the data.
A safe on-ramp for risk-averse buyers. No team needs to grant production write-access on day one to get value out of Cleanse. That conversation can wait until you're ready for it.
Interpretable-first golden record selection. Advanced, interpretable heuristics pick the most likely correct record, and LLMs step in only to resolve ambiguous cases. Every merge decision is traceable back to the signals that drove it — defensible in an audit, without the black-box tradeoff.
Governed today, autonomous tomorrow. Every decision — proposal, approval, write, rejection — lands in an immutable audit log. When an auditor asks "who changed this record and why," the answer is already there. And the architecture that supervises today's workflow is the same one that scales toward fuller autonomy on your terms.
Soda Cloud finds it. Cleanse fixes it. One contract, one workflow, one audit trail.
Soda Cleanse requires Soda 4.0 and the Diagnostics Warehouse (Enterprise plan).
If you're not yet on Soda 4.0, talk to our team to understand how to get started.
What Soda Cleanse Can Fix
Soda Cleanse ships with specialized agents for four failure types. Each is built for the reasoning that type requires.
Entity normalization
Variant names for the same entity ("USA", "U.S.A.", "United States", "United States of America") break joins and inflate counts. The normalization agent derives the canonical form from surrounding data and contract context, then proposes it for approval.
Imputation
Missing values slip through schemas because NULL is often technically valid. The imputation agent reasons from the contract's definition of the field and surrounding data to propose a value that fits for the steward to accept, edit, or reject.
Deduplication
Duplicates are resolved with advanced, interpretable heuristics that identify the most likely correct record, with LLMs stepping in only to resolve ambiguous cases. Merge candidates surface in the Inbox with full evidence of why each record was picked.
Reconciliation
When the same entity carries different values across sources, or drifts from a trusted reference dataset, the reconciliation agent identifies the mismatch and proposes a correction consistent with the contract's source of truth.
How To Get Started
Soda Cleanse is in Private Preview. Access is limited intentionally: we want to work closely with early teams to make sure the agents produce proposals worth approving, and that the steward workflow fits how governance actually runs.
Already on Soda 4.0 with data contracts in place?
If the Diagnostics Warehouse is already running, no new infrastructure is required
Soda Cleanse runs in Kubernetes alongside your existing Soda runner
Teams in early access have reached their first agent proposal in days
Not yet on Soda 4.0?
Install Soda Core 4.0 and explore the documentation.
If you are on v3.0, reach out to your customer engineer to plan how to migrate to v4.0.
Get one data contract in place for one dataset.
Pilot Cleanse on that dataset, then expand from there.
Either way, start narrow: pick the failure type causing the most manual cleanup work on your team, pick one dataset, and let the agent run. Expansion follows naturally from there.
What's Next
Observability and Contracts were the first half: knowing when something is wrong.
Cleanse is the second: resolving it on the same platform, under the same contract, with no exports, tickets, or handoffs in between.
Stewards stop fixing records and start governing the process — approving what the agents propose, rejecting what doesn't hold up, and letting the audit trail do the rest.
Private Preview is the beginning. We're using early access to make sure agents produce proposals that are accurate enough to approve quickly, and to understand how different governance setups affect the steward review workflow.
As agents accumulate approval history specific to your organization's data standards, the review queue gets shorter. The goal isn't just automated remediation. It's remediation that gets smarter over time, without requiring your team to maintain it.
An MCP endpoint is on the roadmap. It will let external agents participate in the remediation workflow directly: driving triage, proposing fixes, and eventually closing the loop without a human for the fix types that have earned it.
If you're already working with Soda and have thoughts on which failure types matter most for your pipelines, join the conversation in Soda Community Slack. That feedback shapes what we build next.










