Blog

Data Contracts

Implementing Data Contracts in Finance: 5 Ready-to-Use Templates

Feb 27, 2026

Fabiana Ferraz

Technical Writer at Soda

Table of Contents

Financial data ecosystems are heavily regulated, audited, and legally accountable. Accurate, complete, and timely data drives every critical process, from financial reporting and loan approvals, to fraud detection, credit risk management and regulatory submissions.

Yet most data teams still operate in reactive mode, debugging reconciliation mismatches, explaining why exposure figures don't tie to accounting, and scrambling before supervisory deadlines.

With thousands of transactions per second flowing across core banking, trading platforms, payment processors, and fund administration systems, traditional manual checks can't keep up anymore. Financial institutions need automated, enforceable control points in their data pipelines.

Data contracts provide exactly that — an executable agreement that validates data as it flows through your pipeline. If data meets defined quality standards, it proceeds downstream. If it doesn’t, the pipeline can block, quarantine, alert, or mark the dataset as invalid — depending on your enforcement strategy (i.e., how you integrate contracts into your pipelines).

In this guide, we'll show how to implement executable data contracts using five practical templates for financial data:

Each template includes real-world failure scenarios and executable Soda contract examples that you can use right away by changing the thresholds and value sets to fit your environment.

What Are Data Contracts in Finance?

Data contracts are machine-readable agreements that define how data should be structured, validated, and governed as it moves between producers and consumers. As data flows through your pipelines, the contract automatically determines whether the data is fit to use according to agreed specifications.

In finance, this means creating explicit agreements between systems that generate data — like core banking platforms, trading systems, payment processors, and ML scoring engines — and the teams that rely on them for regulatory reporting, risk analytics, reconciliation, and fraud detection.

Instead of detecting problems during month-end close or regulatory submission, teams enforce expectations directly inside data pipelines. Together, domain teams and engineers set expectations about schema, freshness, quality rules, and more, and the contract makes those expectations testable.

Not all data contract implementations are equal, though. Some approaches focus on documentation and metadata standards only. Others — like Soda collaborative contracts — enforce validation automatically during ingestion, transformation, or CI/CD workflows.

This means you can add automated checks for accuracy, completeness, and timeliness; prevent poor-quality data from reaching downstream systems; and set up proactive alerts to address issues before they escalate.

When a contract fails, it can:

Block pipeline execution (CI/CD gate)
Prevent downstream table updates
Trigger alerting workflows
Mark datasets as invalid in observability tools

In short, for finance, where data errors can trigger regulatory penalties, misstated risk positions, or failed client obligations, enforceable contracts transform reactive firefighting into predictable operations.

If you would like to learn more about the foundational structure of a data contract, check our guide: The Definitive Guide to Data Contracts

Why Financial Organizations Need Data Contracts

Financial institutions face a unique combination of constraints:

Regulatory pressure (BCBS 239, IFRS, SOX, internal audit)
Strict reconciliation requirements
High financial and reputational risk
Complex multi-system architectures
Increasing reliance on machine learning outputs

So we'd say there are four compelling reasons why data contracts are not just useful but essential for financial institutions:

First, they prevent silent data failures.

Data contracts act as check-points, stopping data if it violates agreed-upon schemas, types, or constraints.

For example, if a developer renames exposure_amount to exposure_value without coordinating with downstream teams, regulatory reporting logic may silently exclude the column of risk data.

Data contracts catch these breaking changes before they reach production, eliminating the kind of emergency that leads to supervisory scrutiny.

Second, they define clear ownership boundaries.

Every contract must have an assigned owner. So, when data quality issues arise, contracts point exactly to who fixes what.

Large financial institutions increasingly adopt data mesh or domain-oriented architectures. Often, front office, middle office, back office, risk, compliance, and data engineering all touch the same data at different stages.

Data contracts define the interoperability standards between domains, so that decentralization doesn’t become fragmentation.

Third, they enable safe change management.

Financial data management requires constant evolution to integrate new regulatory requirements, updated risk models, expanded product lines and evolving payment methods.

Data contracts enforce versioning, backward compatibility expectations, and approval workflows, ensuring changes are intentional and coordinated rather than accidental and chaotic.

Also, because contracts are version-controlled and reviewable, they create an auditable history of when validation rules changed and who approved them.

Fourth, they support compliance and governance requirements.

With regulations like BCBS 239, SOX, MiFID II, and PCI DSS, financial institutions must demonstrate that data is accurate, complete, timely, and reconcilable across systems.

Data contracts operationalize key governance requirements — such as completeness, timeliness, and reconciliation tolerances — directly inside data pipelines, turning compliance from a periodic audit into continuous, automated monitoring.

🟢 The advantage of data contracts for both producers and consumers is clear: less downtime, fewer surprises, and less time spent fixing other people's changes.

By establishing automated validation at every checkpoint, contracts transform data quality challenges into managed, predictable processes. And by catching issues at the data level, you avoid expensive regulatory penalties, restatements, and erosion of stakeholder trust.

Now, let's see how these contracts are structured and how you can apply them to your financial data.

Data Contract Templates for the Financial Sector

We've launched a data contract template gallery with ready-to-use contracts across industries and use cases. Watch the short video below where Santiago shows how to access the gallery and how to read Soda data contracts.

Let's explore the structure of data contracts in practice by going over our 5 ready-to-use finance templates: BCBS 239 Regulatory Exposures, Transaction Ledger, Account Balances, Portfolio Holdings, and Fraud Detection ML Scores.

Template #1: Transaction Ledger Data Contract

The transaction ledger is the operational backbone of any financial institution. Every debit, credit, transfer, fee, and reversal flows through it. It feeds balance calculations, regulatory reporting, reconciliation processes, and customer-facing statements. When ledger data is corrupted, the entire financial operation loses its source of truth.

Consider this failure scenario:

A payment processing integration starts submitting transactions with a new status value ("ON_HOLD") that your downstream systems don't recognize. Your balance calculation engine silently skips these records, and account balances start drifting from reality. By the time treasury discovers the discrepancy during end-of-day reconciliation, thousands of transactions are in limbo and client balances are misstated.

The transaction ledger data contract template prevents this scenario by enforcing schema stability, financial consistency rules, and controlled value sets before data reaches critical systems.

Dataset-level protection:

The schema check with strict settings (allow_extra_columns: false) catches any structural changes. The freshness check enforces a 24-hour SLA. Cross-field integrity checks prevent the most expensive ledger errors: future-dated transactions and sign/type mismatches.

variables:
  FRESHNESS_HOURS:
    default: 24

checks:
  - schema:
      allow_extra_columns: false
      allow_other_column_order: false
  - row_count:
      threshold:
        must_be_greater_than: 0
  - freshness:
      column: transaction_timestamp
      threshold:
        unit: hour
        must_be_less_than_or_equal: ${var.FRESHNESS_HOURS}
  - failed_rows:
      name: "Transaction timestamp must not be in the future"
      qualifier: ts_not_future
      expression: transaction_timestamp > ${soda.NOW}
  - failed_rows:
      name: "Amount sign must match transaction_type (DEBIT negative, CREDIT positive)"
      qualifier: amount_sign_by_type
      expression: >
        (transaction_type = 'DEBIT' AND amount >= 0)
        OR (transaction_type = 'CREDIT' AND amount <= 0)

The amount-sign-by-type check enforces a signed-amount convention where debits carry negative values and credits carry positive values. This is one common ledger design — others store amounts as absolute values and encode direction separately via the transaction_type column. Adjust this check to match your ledger's sign convention. Regardless of which convention you use, the key principle is the same: the relationship between amount and direction must be consistent and testable.

Column-level validation:

Required fields ensure every transaction has a unique UUID, valid account and customer references, and a timestamp. The duplicate check on transaction_id prevents double-posting. Currency codes are validated against ISO-4217 format. Transaction types and statuses are restricted to controlled value sets, preventing integration failures with downstream systems.

columns:
  - name: transaction_id
    data_type: varchar
    checks:
      - missing:
      - duplicate:
      - invalid:
          name: "transaction_id must be a UUID"
          valid_format:
            name: UUID
            regex: "^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}$"
  - name: account_id
    data_type: varchar
    checks:
      - missing:
      - invalid:
          name: "account_id must be non-empty and sane length"
          valid_min_length: 1
          valid_max_length: 64
  - name: customer_id
    data_type: varchar
    checks:
      - missing:
      - invalid:
          name: "customer_id must be non-empty and sane length"
          valid_min_length: 1
          valid_max_length: 64
  - name: transaction_timestamp
    data_type: timestamp
    checks:
      - missing:
  - name: amount
    data_type: decimal
    checks:
      - missing:
      - invalid:
          name: "Amount must not be zero"
          invalid_values: [0]
          threshold:
            metric: percent
            must_be_less_than: 0.1
  - name: currency
    data_type: varchar
    checks:
      - missing:
      - invalid:
          name: "Currency must be ISO-4217-like (3 uppercase letters)"
          valid_format:
            name: ISO-4217 code
            regex: "^[A-Z]{3}$"
  - name: transaction_type
    data_type: varchar
    checks:
      - missing:
      - invalid:
          name: "Allowed transaction types"
          valid_values:
            - DEBIT
            - CREDIT
            - TRANSFER
            - FEE
            - REVERSAL
            - ADJUSTMENT
  - name: status
    data_type: varchar
    checks:
      - missing:
      - invalid:
          name: "Allowed statuses"
          valid_values:
            - PENDING
            - POSTED
            - REVERSED
            - FAILED
            - CANCELLED
  - name: reference_id
    data_type: varchar
    checks:
      - invalid:
          name: "reference_id length guardrail"
          valid_max_length: 128

Note the use of a threshold on the zero-amount check — allowing up to 0.1% zero-value transactions to accommodate legitimate edge cases (like fee waivers) while still catching systematic issues.

🟢 This contract protects the integrity of your entire financial ledger: preventing double-posted transactions, enforcing accounting sign conventions, maintaining ISO currency standards, and ensuring every record is traceable to an account, customer, and timestamp.

Template #2: Account Balances Data Contract

Account balance data is the daily snapshot that every downstream process depends on. Treasury uses it for liquidity management, finance uses it for reporting, risk uses it for exposure calculations, and customer-facing systems use it for statements and real-time balance inquiries. When balance data is wrong, trust erodes across the entire institution.

Consider this second failure scenario:

An overnight batch job fails silently, and balance records for a subset of accounts arrive with yesterday's closing balance copied into today's opening balance and closing balance — effectively freezing those accounts in time. Your treasury team makes liquidity decisions based on stale positions, while customer statements show incorrect balances. The error isn't discovered until a client disputes their statement two days later.

The account balances data contract prevents this scenario by enforcing temporal consistency, cross-day continuity, and financial integrity before data reaches critical systems.

Dataset-level protection:

The schema check prevents unexpected structural changes. The freshness check ensures balance data is never older than 24 hours. Cross-field integrity checks enforce the most critical balance rule: today's opening balance must equal yesterday's closing balance — the fundamental continuity equation of account management.

variables:
  FRESHNESS_HOURS:
    default: 24
  RECONCILIATION_TOLERANCE:
    default: 0.01

checks:
  - schema:
      allow_extra_columns: false
      allow_other_column_order: false
  - row_count:
      threshold:
        must_be_greater_than: 0
  - freshness:
      column: balance_date
      threshold:
        unit: hour
        must_be_less_than_or_equal: ${var.FRESHNESS_HOURS}
  - failed_rows:
      name: "balance_date must not be in the future"
      qualifier: balance_date_not_future
      expression: balance_date > CURRENT_DATE
  - failed_rows:
      name: "closing_balance must not be negative for standard accounts"
      qualifier: closing_non_negative
      expression: closing_balance < 0
  - failed_rows:
      name: "Duplicate balance per account per date"
      qualifier: dup_account_date
      query: |
        SELECT account_id, balance_date
        FROM datasource.db.schema.account_balances
        GROUP BY account_id, balance_date
        HAVING COUNT(*) > 1
      threshold:
        must_be: 0
  - failed_rows:
      name: "Opening balance must equal previous day closing balance"
      qualifier: opening_eq_prev_closing
      query: |
        SELECT curr.account_id, curr.balance_date
        FROM datasource.db.schema.account_balances curr
        JOIN datasource.db.schema.account_balances prev
          ON curr.account_id = prev.account_id
          AND curr.balance_date = DATEADD('day', 1, prev.balance_date)
        WHERE ABS(curr.opening_balance - prev.closing_balance) > ${var.RECONCILIATION_TOLERANCE}
      threshold:
        must_be: 0

The opening-equals-previous-closing check is the centerpiece of this contract. It uses a self-join to compare each day's opening balance against the prior day's closing balance, with a configurable tolerance variable (RECONCILIATION_TOLERANCE) to handle floating-point precision. This is the kind of check that, when it fails, immediately signals a data pipeline issue or a missed transaction posting.

Note the use of variables for both freshness and reconciliation tolerance. This allows teams to easily tune thresholds per environment without editing the core contract logic.

Column-level validation:

Required fields ensure every balance record has an account identifier, date, and both opening and closing values. Currency codes are validated against ISO-4217 format, preventing mismatched currencies from corrupting multi-currency reporting.

columns:
  - name: account_id
    data_type: string
    checks:
      - missing:
          name: No missing values
      - invalid:
          name: "account_id length guardrail"
          valid_min_length: 1
          valid_max_length: 64
  - name: balance_date
    data_type: date
    checks:
      - missing:
          name: No missing values
  - name: opening_balance
    data_type: decimal
    checks:
      - missing:
          name: No missing values
  - name: closing_balance
    data_type: decimal
    checks:
      - missing:
          name: No missing values
  - name: currency
    data_type: string
    checks:
      - missing:
          name: No missing values
      - invalid:
          name: "Currency must be ISO-4217 (3 uppercase letters)"
          valid_format:
            name: ISO-4217 code
            regex: "^[A-Z]{3}$"

🟢 This contract ensures balance continuity across days, prevents duplicate snapshots, enforces currency standardization, and catches stale or future-dated records. Business-wise, it protects treasury decisions, client statements, and regulatory balance reporting from operating on inconsistent data.

Template #3: Portfolio Holdings Data Contract

Portfolio holdings data is the foundation of asset management operations. It feeds net asset value (NAV) calculations, client reporting, risk attribution, performance measurement, and regulatory filings. When holdings data is incorrect, the consequences ripple across investment decisions, client trust, and compliance obligations.

Consider another failure scenario:

A fund administrator's data feed introduces duplicate records for the same asset within a portfolio — one from the custodian and one from the order management system. Your NAV calculation double-counts the position, inflating the fund's reported value. Investors make subscription and redemption decisions based on an incorrect NAV, and by the time the error surfaces during the monthly audit, you're facing a NAV restatement and potential investor claims.

The portfolio holdings data contract prevents the scenario above by enforcing position uniqueness, logical consistency, and controlled asset classifications before data reaches NAV engines, risk systems, and client reports.

Dataset-level protection:

The schema check prevents unexpected structural changes. The freshness check enforces a 24-hour SLA. Cross-field integrity checks enforce critical holdings logic: no negative quantities or market values, no duplicate positions per portfolio/asset/date, and a consistency rule ensuring that zero-quantity positions carry zero market value.

variables:
  FRESHNESS_HOURS:
    default: 24

checks:
  - schema:
      allow_extra_columns: false
      allow_other_column_order: false
  - row_count:
      threshold:
        must_be_greater_than: 0
  - freshness:
      column: as_of_date
      threshold:
        unit: hour
        must_be_less_than_or_equal: ${var.FRESHNESS_HOURS}
  - failed_rows:
      name: "as_of_date must not be in the future"
      qualifier: as_of_date_not_future
      expression: as_of_date > CURRENT_DATE
  - failed_rows:
      name: "market_value must be non-negative"
      qualifier: market_value_non_negative
      expression: market_value < 0
  - failed_rows:
      name: "quantity must be non-negative"
      qualifier: quantity_non_negative
      expression: quantity < 0
  - failed_rows:
      name: "No duplicate holdings per portfolio, asset, and date"
      qualifier: dup_holding
      query: |
        SELECT portfolio_id, asset_id, as_of_date
        FROM datasource.db.schema.portfolio_holdings
        GROUP BY portfolio_id, asset_id, as_of_date
        HAVING COUNT(*) > 1
      threshold:
        must_be: 0
  - failed_rows:
      name: "Holdings with zero quantity must have zero market value"
      qualifier: zero_qty_zero_mv
      expression

The zero-quantity-zero-market-value check catches phantom holdings — positions that appear to have been closed (quantity = 0) but still carry a market value. These ghost records are a common source of NAV distortion and can persist for months if not actively validated.

Column-level validation:

Required fields ensure every holding can be traced to a specific portfolio, asset, and date. The asset_type field is restricted to a controlled set of classifications covering equities, fixed income, derivatives, FX, commodities, cash, funds, and alternatives. The quantity field enforces a non-negative minimum, preventing impossible position states.

columns:
  - name: portfolio_id
    data_type: string
    checks:
      - missing:
          name: No missing values
      - invalid:
          name: "portfolio_id length guardrail"
          valid_min_length: 1
          valid_max_length: 64
  - name: asset_id
    data_type: string
    checks:
      - missing:
          name: No missing values
      - invalid:
          name: "asset_id length guardrail"
          valid_min_length: 1
          valid_max_length: 64
  - name: asset_type
    data_type: string
    checks:
      - missing:
          name: No missing values
      - invalid:
          name: "Allowed asset types"
          valid_values:
            - EQUITY
            - FIXED_INCOME
            - DERIVATIVE
            - FX
            - COMMODITY
            - CASH
            - FUND
            - ALTERNATIVE
  - name: quantity
    data_type: decimal
    checks:
      - missing:
          name: No missing values
      - invalid:
          name: "Quantity must be zero or positive"
          valid_min: 0
  - name: market_value
    data_type: decimal
    checks:
      - missing:
          name: No missing values
  - name: as_of_date
    data_type: date
    checks:
      - missing:
          name

🟢 This contract prevents duplicate positions that inflate NAV calculations, enforces asset classification consistency across reporting systems, catches phantom holdings, and ensures position data is fresh and logically valid. Business-wise, it protects investment decisions, client reporting, and regulatory filings.

Template #4: Fraud Detection ML Scores Data Contract

Machine learning models are increasingly central to financial operations. Today, they power fraud detection, credit scoring, AML screening, and transaction monitoring. And when ML scoring data is corrupted, the consequences range from blocked legitimate transactions to undetected fraud.

Consider this fourth failure scenario:

A model retraining pipeline deploys a new version with a normalization bug that outputs fraud scores above 1.0 for certain transaction patterns. Your real-time fraud engine, expecting scores between 0 and 1, treats these as maximum-confidence fraud signals and blocks thousands of legitimate transactions over a weekend. By Monday morning, customer complaints are flooding in and, because model_version wasn't tracked consistently, the engineering team can't quickly identify which model version caused the problem.

The fraud detection ML scores data contract prevents this scenario by enforcing score ranges, model traceability, and scoring freshness before predictions reach fraud-blocking systems.

Dataset-level protection:

The schema check prevents unexpected structural changes. The freshness check enforces a 1-hour SLA (much tighter than other financial datasets because fraud detection operates in near real-time). Cross-field integrity checks enforce the most critical ML data rules: scores must stay within the 0–1 range, and prediction labels must reasonably align with the underlying scores.

variables:
  FRESHNESS_HOURS:
    default: 1

checks:
  - schema:
      allow_extra_columns: false
      allow_other_column_order: false
  - row_count:
      threshold:
        must_be_greater_than: 0
  - freshness:
      column: scored_at
      threshold:
        unit: hour
        must_be_less_than_or_equal: ${var.FRESHNESS_HOURS}
  - failed_rows:
      name: "scored_at must not be in the future"
      qualifier: scored_at_not_future
      expression: scored_at > ${soda.NOW}
  - failed_rows:
      name: "fraud_score must be between 0 and 1"
      qualifier: score_range
      expression: fraud_score < 0.0 OR fraud_score > 1.0
  - failed_rows:
      name: "No duplicate scores per transaction per model"
      qualifier: dup_txn_model
      query: |
        SELECT transaction_id, model_id
        FROM datasource.db.schema.fraud_scores
        GROUP BY transaction_id, model_id
        HAVING COUNT(*) > 1
      threshold:
        must_be: 0
  - failed_rows:
      name: "prediction_label must align with fraud_score threshold"
      qualifier: label_score_alignment
      expression: >
        (prediction_label = 'FRAUD' AND fraud_score < 0.5)
        OR (prediction_label = 'LEGIT' AND fraud_score > 0.9)

The label-score alignment check is intentionally designed to be soft rather than strict. It flags clear contradictions — a FRAUD label on a low-confidence score, or a LEGIT label on a very high score — while allowing the SUSPICIOUS middle ground where model thresholds may differ between deployments. Adjust the thresholds to match your own model's decision boundaries.

The duplicate scores per transaction per model check prevents double-scoring, which can happen when retry logic in streaming pipelines reprocesses the same transaction.

Column-level validation:

Required fields ensure every score is traceable to a specific transaction, model, and version. The transaction_id UUID format check maintains consistency with the transaction ledger. The model_version semantic versioning format enables lineage tracking across model deployments. The feature_hash SHA-256 validation ensures the feature vector was computed and logged — critical for model auditability and reproducibility.

columns:
  - name: transaction_id
    data_type: string
    checks:
      - missing:
          name: No missing values
      - invalid:
          name: "transaction_id must be a UUID"
          valid_format:
            name: UUID
            regex: "^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}$"
  - name: model_id
    data_type: string
    checks:
      - missing:
          name: No missing values
      - invalid:
          name: "model_id length guardrail"
          valid_min_length: 1
          valid_max_length: 64
  - name: model_version
    data_type: string
    checks:
      - missing:
          name: No missing values
      - invalid:
          name: "Semantic version format (e.g. 1.2.3)"
          valid_format:
            name: semver
            regex: "^\\\\d+\\\\.\\\\d+\\\\.\\\\d+$"
  - name: fraud_score
    data_type: decimal
    checks:
      - missing:
          name: No missing values
      - invalid:
          name: "Score must be between 0 and 1"
          valid_min: 0.0
          valid_max: 1.0
  - name: prediction_label
    data_type: string
    checks:
      - missing:
          name: No missing values
      - invalid:
          name: "Allowed prediction labels"
          valid_values:
            - LEGIT
            - SUSPICIOUS
            - FRAUD
  - name: scored_at
    data_type: timestamp
    checks:
      - missing:
          name: No missing values
  - name: feature_hash
    data_type: string
    checks:
      - missing:
          name: No missing values
      - invalid:
          name: "feature_hash must be a valid SHA-256 hex string"
          valid_format:
            name: SHA-256 hash
            regex: "^[a-fA-F0-9]{64}$"

🟢 This contract prevents broken model outputs from blocking legitimate transactions, enforces model traceability for audit and rollback, catches double-scored transactions, and ensures scoring freshness meets real-time fraud detection SLAs. This helps bridge the gap between data engineering and data science governance.

Template #5: BCBS 239 Regulatory Exposures Data Contract

Regulatory exposure data is the foundation of risk reporting. It feeds capital adequacy calculations, stress testing scenarios, and supervisory submissions. A single inconsistency between your risk and accounting data can trigger regulatory scrutiny, or worse, a capital surcharge.

The BCBS 239 regulation, for instance, makes data quality a regulatory requirement for banks to demonstrate that risk data is accurate, complete, timely, and reconcilable across systems.

Consider this last failure scenario:

A data migration renames counterparty_id in the exposures table, but the counterparty master still uses the original column name. Your join silently returns zero matches, and your BCBS 239 submission reports zero counterparty exposures for an entire business line. The supervisor flags the gap during their next review, triggering a formal remediation program.

The BCBS 239 data contract template prevents this scenario by enforcing referential integrity, format validation, and cross-system reconciliation before data reaches regulatory reporting systems.

Dataset-level protection:

The schema check catches structural changes immediately. The completeness check uses a SQL query to identify any counterparties present in the master but missing from exposures — a direct operationalization of BCBS 239 Principle 4 (Completeness).

checks:
  - schema: {}
  - failed_rows:
      name: counterparties_missing_in_exposures
      query: |
        SELECT m.counterparty_id
        FROM datasource.db.schema.counterparty_master m
        LEFT JOIN datasource.db.schema.exposures e
          ON e.counterparty_id = m.counterparty_id
        WHERE e.counterparty_id IS NULL
      threshold:
        must_be: 0
      attributes:
        bcbs239:
          - P4
        description: "Completeness: any counterparties missing from exposures?"

Column-level validation:

The LEI code format check enforces the standard 20-character alphanumeric pattern required for counterparty identification. The referential integrity check verifies that every counterparty_id in exposures exists in the official counterparty master, preventing broken joins that silently corrupt risk aggregation.

columns:
  - name: lei_code
    data_type: string
    checks:
      - invalid:
          name: lei_code_format
          valid_format:
            name: LEI must be 20 alphanumeric
            regex: '^[A-Z0-9]{20}$'
          attributes:
            bcbs239:
              - P3
            description: LEI format is 20 uppercase alphanumerics
  - name: counterparty_id
    checks:
      - invalid:
          name: counterparty_id_in_master
          valid_reference_data:
            dataset: datasource/db/schema/counterparty_master
            column: counterparty_id
          attributes:
            bcbs239:
              - P3
            description: "Integrity: all counterparties in exposures exist in the master"

Cross-system reconciliation:

This is where BCBS 239 contracts become especially powerful. The reconciliation block compares total exposure amounts against accounting balances, flagging discrepancies that exceed the defined tolerance. This directly operationalizes Principles 3 (Accuracy) and 7 (Reporting Accuracy).

reconciliation:
  source:
    dataset: datasource/db/schema/accounting_balances
  checks:
    - metric_diff:
        name: total_exposures_vs_total_balance
        source_expression: SUM(balance_amount)
        target_expression: SUM(exposure_amount)
        threshold:
          must_be_less_than: 1000
        attributes:
          bcbs239:
            - P3
            - P7
          description: "Reconciliation vs accounting: Overall tolerance across the books"

Note the use of attributes to tag each check with the specific BCBS 239 principles it addresses. This creates an audit trail that maps directly to supervisory expectations.

🟢 This contract ensures exposure reporting is complete, accurate, and reconciled with accounting data — the three pillars that regulators scrutinize most during BCBS 239 assessments.

Implementation with Soda

Having the right data contract templates is only the beginning. The real value comes from embedding these contracts into your data pipelines so they actively prevent issues rather than documenting expectations that nobody enforces.

The finance templates we've covered provide starting points, but you'll need to adjust thresholds, add institution-specific validations, and align status values and classifications with your actual systems.

From Template to Production

Step 1: Treat Contracts as Code

→ Download templates from Soda's template gallery and customize them for your specific business rules.

→ Store in version control using Git. Your contracts should live in a repository where changes are tracked, reviewed, and deployed through the same rigorous process as your production code. In financial institutions, this creates an audit trail showing exactly when quality rules changed and why (evidence that regulators expect to see).

→ Review through pull requests before any contract changes reach production. When a team wants to relax a validation threshold or add a new transaction type, that change goes through code review, where stakeholders can assess the business and regulatory impact. This prevents silent degradation of data quality standards over time.

Step 2: Verify Locally

Before deploying contracts to production, validate them against your actual data:

# pip install soda-{data source} for other data sources
pip install soda-snowflake

# verify the contract locally against a data source
soda contract verify -c contract.yml -ds

→ Test against real data sources in development or staging environments. Don't just validate the YAML syntax — run actual queries against representative data to see which checks pass and which fail.

→ Iterate on thresholds and rules based on what you learn. If your balance freshness check fails because weekend batch jobs genuinely arrive after 24 hours, adjust the threshold to 48 hours rather than creating false positives that train teams to ignore alerts.

Step 3: Embed in Your Pipeline

Data contracts only protect you if they're enforced at the right checkpoints. Strategic placement determines which issues you catch early versus which slip through to cause downstream damage.

Where to enforce contracts in finance:

After ingestion from core banking systems — Validate transaction and balance data immediately after it arrives, catching issues while they're still close to the source
Before loading into the data warehouse — Create a quality gate that prevents corrupt data from entering your analytical environment
At API boundaries for payment and trading systems — Validate transaction data at the integration layer before it reaches internal ledgers
In ETL/ELT processes — Integrate with tools like dbt to validate data transformations at each stage
After ML model scoring — Validate fraud scores and predictions before they reach real-time decisioning engines

Enforcement mechanism: CI/CD pipelines check that schema changes are compatible with contracts before deployment. Data quality tools validate constraints against actual data on schedule or in real-time. When a contract fails, the data stops flowing until someone investigates and resolves the issue.

Step 4: Publish and Monitor

Once contracts are validated and embedded, publish them to Soda Cloud for continuous monitoring:

# publish and schedule the contract with Soda Cloud
soda contract publish -c contract.yml -sc

→ Automated validation on schedule runs your contracts hourly, daily, or in real-time depending on your data freshness requirements. For fraud scoring, this might be every few minutes. For balance snapshots, daily after the batch job completes.

→ Smart alerting to the right owners routes failures to the teams responsible for fixing them. When the account balances contract fails because opening balances don't match the previous day's close, the alert goes to the core banking integration team, not the data platform team. Clear ownership prevents alerts from being ignored.

→ Data observability dashboard provides visibility into contract health across all datasets. Teams can see which contracts are passing, which are failing, and trending metrics on data quality over time.

Book a demo to see how data contracts work in your specific environment with your actual data sources and business requirements.

AI-Powered Contract Generation

Don't want to write YAML from scratch? Soda's AI-powered contract autopilot can generate contracts automatically by analyzing your data structure and patterns.

One-click contract creation examines your tables and suggests appropriate checks based on data types, cardinality, and distribution patterns.

Start Small, Scale Systematically

Don't try to implement contracts to all your datasets at once. Choose the ones that have the highest business impact to your use case. Get these contracts working smoothly, build confidence with stakeholders, then expand to the rest of your database.

This progressive approach lets you refine your process (verification workflows, alerting configurations, ownership assignments) on high-value datasets before scaling across your entire data ecosystem.

Ownership and Accountability

Each contract needs a clearly named owner — a person or role responsible for responding to breaches and approving changes. Without clear ownership, failed contracts generate alerts that no one takes action on, and quality standards decline through negligence.

For finance contracts, ownership typically aligns with business domains:

BCBS 239 Exposures contract → Risk & Regulatory Reporting team
Transaction Ledger contract → Finance Operations / Treasury
Account Balances contract → Core Banking / Back Office Operations
Portfolio Holdings contract → Investment / Asset Management team
Fraud Scores contract → Data Science / Fraud Analytics team

This domain-aligned ownership ensures the people who best understand the business rules and regulatory requirements are responsible for maintaining contract quality over time.

Evolving Contracts Over Time

Data contracts aren't static documents — they evolve as regulations change, new products launch, and risk models are updated.

As requirements change, producers propose contract updates, discuss them with consumers, and roll out new versions in a controlled way. This feedback loop prevents the chaos of uncoordinated changes while allowing necessary evolution.

When a new regulation introduces additional reporting fields, the contract update flows through the same PR review process as any code change — with full visibility into who approved what and when.

Watch the Soda collaboration workflow demonstration below showing how business users define quality expectations in the UI while data engineers validate and publish contracts using code and CLI—bridging the gap between business knowledge and technical implementation.

Conclusion: Beyond Validation to True Ownership

The templates in this guide represent best practices for financial data quality today. Use them to build stability and prevent expensive data failures. But as you implement them, remember that the ultimate goal is empowering business domain experts to own their data quality.

The people closest to regulatory decisions understand what data quality actually means in context. They know when a €1,000 reconciliation tolerance is appropriate versus when it needs to be €0.01. They understand why a 1-hour freshness SLA matters for fraud scoring but 24 hours is fine for balance snapshots. And they can assess the tradeoff between strict validation and operational flexibility.

The real gain with data contracts isn't just validation — it's ownership and collaboration.

If we assume data engineers build data products and the business simply consumes them, we've created distance between data and decisions. The contracts we've discussed today bridge that gap through collaboration: risk and compliance stakeholders define quality expectations, engineers implement validation logic, and both parties review changes together.

Looking forward, the goal isn't stricter interfaces that require engineering intervention for every change. It's enabling those closest to regulatory and business decisions to shape, evolve, and validate data products themselves, with automation doing the heavy lifting.

Ready to implement data contracts in your financial data pipelines?

Download the templates: Access all finance data contract templates and customize them for your institution. We recommend starting with the transaction ledger and account balances for immediate impact.

See it in action: Book a demo to see how data contracts work in your specific environment with your actual data sources and business requirements.

Explore more use cases: Visit the complete template gallery for data contracts across industries — from retail to healthcare to manufacturing — and discover patterns you can apply to your financial operations.

Yet most data teams still operate in reactive mode, debugging reconciliation mismatches, explaining why exposure figures don't tie to accounting, and scrambling before supervisory deadlines.

In this guide, we'll show how to implement executable data contracts using five practical templates for financial data:

Each template includes real-world failure scenarios and executable Soda contract examples that you can use right away by changing the thresholds and value sets to fit your environment.

What Are Data Contracts in Finance?

When a contract fails, it can:

Block pipeline execution (CI/CD gate)
Prevent downstream table updates
Trigger alerting workflows
Mark datasets as invalid in observability tools

If you would like to learn more about the foundational structure of a data contract, check our guide: The Definitive Guide to Data Contracts

Why Financial Organizations Need Data Contracts

Financial institutions face a unique combination of constraints:

Regulatory pressure (BCBS 239, IFRS, SOX, internal audit)
Strict reconciliation requirements
High financial and reputational risk
Complex multi-system architectures
Increasing reliance on machine learning outputs

So we'd say there are four compelling reasons why data contracts are not just useful but essential for financial institutions:

First, they prevent silent data failures.

Data contracts act as check-points, stopping data if it violates agreed-upon schemas, types, or constraints.

For example, if a developer renames exposure_amount to exposure_value without coordinating with downstream teams, regulatory reporting logic may silently exclude the column of risk data.

Data contracts catch these breaking changes before they reach production, eliminating the kind of emergency that leads to supervisory scrutiny.

Second, they define clear ownership boundaries.

Every contract must have an assigned owner. So, when data quality issues arise, contracts point exactly to who fixes what.

Data contracts define the interoperability standards between domains, so that decentralization doesn’t become fragmentation.

Third, they enable safe change management.

Financial data management requires constant evolution to integrate new regulatory requirements, updated risk models, expanded product lines and evolving payment methods.

Data contracts enforce versioning, backward compatibility expectations, and approval workflows, ensuring changes are intentional and coordinated rather than accidental and chaotic.

Also, because contracts are version-controlled and reviewable, they create an auditable history of when validation rules changed and who approved them.

Fourth, they support compliance and governance requirements.

With regulations like BCBS 239, SOX, MiFID II, and PCI DSS, financial institutions must demonstrate that data is accurate, complete, timely, and reconcilable across systems.

🟢 The advantage of data contracts for both producers and consumers is clear: less downtime, fewer surprises, and less time spent fixing other people's changes.

Now, let's see how these contracts are structured and how you can apply them to your financial data.

Data Contract Templates for the Financial Sector

Template #1: Transaction Ledger Data Contract

Consider this failure scenario:

The transaction ledger data contract template prevents this scenario by enforcing schema stability, financial consistency rules, and controlled value sets before data reaches critical systems.

Dataset-level protection:

variables:
  FRESHNESS_HOURS:
    default: 24

checks:
  - schema:
      allow_extra_columns: false
      allow_other_column_order: false
  - row_count:
      threshold:
        must_be_greater_than: 0
  - freshness:
      column: transaction_timestamp
      threshold:
        unit: hour
        must_be_less_than_or_equal: ${var.FRESHNESS_HOURS}
  - failed_rows:
      name: "Transaction timestamp must not be in the future"
      qualifier: ts_not_future
      expression: transaction_timestamp > ${soda.NOW}
  - failed_rows:
      name: "Amount sign must match transaction_type (DEBIT negative, CREDIT positive)"
      qualifier: amount_sign_by_type
      expression: >
        (transaction_type = 'DEBIT' AND amount >= 0)
        OR (transaction_type = 'CREDIT' AND amount <= 0)

Column-level validation:

columns:
  - name: transaction_id
    data_type: varchar
    checks:
      - missing:
      - duplicate:
      - invalid:
          name: "transaction_id must be a UUID"
          valid_format:
            name: UUID
            regex: "^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}$"
  - name: account_id
    data_type: varchar
    checks:
      - missing:
      - invalid:
          name: "account_id must be non-empty and sane length"
          valid_min_length: 1
          valid_max_length: 64
  - name: customer_id
    data_type: varchar
    checks:
      - missing:
      - invalid:
          name: "customer_id must be non-empty and sane length"
          valid_min_length: 1
          valid_max_length: 64
  - name: transaction_timestamp
    data_type: timestamp
    checks:
      - missing:
  - name: amount
    data_type: decimal
    checks:
      - missing:
      - invalid:
          name: "Amount must not be zero"
          invalid_values: [0]
          threshold:
            metric: percent
            must_be_less_than: 0.1
  - name: currency
    data_type: varchar
    checks:
      - missing:
      - invalid:
          name: "Currency must be ISO-4217-like (3 uppercase letters)"
          valid_format:
            name: ISO-4217 code
            regex: "^[A-Z]{3}$"
  - name: transaction_type
    data_type: varchar
    checks:
      - missing:
      - invalid:
          name: "Allowed transaction types"
          valid_values:
            - DEBIT
            - CREDIT
            - TRANSFER
            - FEE
            - REVERSAL
            - ADJUSTMENT
  - name: status
    data_type: varchar
    checks:
      - missing:
      - invalid:
          name: "Allowed statuses"
          valid_values:
            - PENDING
            - POSTED
            - REVERSED
            - FAILED
            - CANCELLED
  - name: reference_id
    data_type: varchar
    checks:
      - invalid:
          name: "reference_id length guardrail"
          valid_max_length: 128

Template #2: Account Balances Data Contract

Consider this second failure scenario:

The account balances data contract prevents this scenario by enforcing temporal consistency, cross-day continuity, and financial integrity before data reaches critical systems.

Dataset-level protection:

variables:
  FRESHNESS_HOURS:
    default: 24
  RECONCILIATION_TOLERANCE:
    default: 0.01

checks:
  - schema:
      allow_extra_columns: false
      allow_other_column_order: false
  - row_count:
      threshold:
        must_be_greater_than: 0
  - freshness:
      column: balance_date
      threshold:
        unit: hour
        must_be_less_than_or_equal: ${var.FRESHNESS_HOURS}
  - failed_rows:
      name: "balance_date must not be in the future"
      qualifier: balance_date_not_future
      expression: balance_date > CURRENT_DATE
  - failed_rows:
      name: "closing_balance must not be negative for standard accounts"
      qualifier: closing_non_negative
      expression: closing_balance < 0
  - failed_rows:
      name: "Duplicate balance per account per date"
      qualifier: dup_account_date
      query: |
        SELECT account_id, balance_date
        FROM datasource.db.schema.account_balances
        GROUP BY account_id, balance_date
        HAVING COUNT(*) > 1
      threshold:
        must_be: 0
  - failed_rows:
      name: "Opening balance must equal previous day closing balance"
      qualifier: opening_eq_prev_closing
      query: |
        SELECT curr.account_id, curr.balance_date
        FROM datasource.db.schema.account_balances curr
        JOIN datasource.db.schema.account_balances prev
          ON curr.account_id = prev.account_id
          AND curr.balance_date = DATEADD('day', 1, prev.balance_date)
        WHERE ABS(curr.opening_balance - prev.closing_balance) > ${var.RECONCILIATION_TOLERANCE}
      threshold:
        must_be: 0

Note the use of variables for both freshness and reconciliation tolerance. This allows teams to easily tune thresholds per environment without editing the core contract logic.

Column-level validation:

columns:
  - name: account_id
    data_type: string
    checks:
      - missing:
          name: No missing values
      - invalid:
          name: "account_id length guardrail"
          valid_min_length: 1
          valid_max_length: 64
  - name: balance_date
    data_type: date
    checks:
      - missing:
          name: No missing values
  - name: opening_balance
    data_type: decimal
    checks:
      - missing:
          name: No missing values
  - name: closing_balance
    data_type: decimal
    checks:
      - missing:
          name: No missing values
  - name: currency
    data_type: string
    checks:
      - missing:
          name: No missing values
      - invalid:
          name: "Currency must be ISO-4217 (3 uppercase letters)"
          valid_format:
            name: ISO-4217 code
            regex: "^[A-Z]{3}$"

Template #3: Portfolio Holdings Data Contract

Consider another failure scenario:

Dataset-level protection:

variables:
  FRESHNESS_HOURS:
    default: 24

checks:
  - schema:
      allow_extra_columns: false
      allow_other_column_order: false
  - row_count:
      threshold:
        must_be_greater_than: 0
  - freshness:
      column: as_of_date
      threshold:
        unit: hour
        must_be_less_than_or_equal: ${var.FRESHNESS_HOURS}
  - failed_rows:
      name: "as_of_date must not be in the future"
      qualifier: as_of_date_not_future
      expression: as_of_date > CURRENT_DATE
  - failed_rows:
      name: "market_value must be non-negative"
      qualifier: market_value_non_negative
      expression: market_value < 0
  - failed_rows:
      name: "quantity must be non-negative"
      qualifier: quantity_non_negative
      expression: quantity < 0
  - failed_rows:
      name: "No duplicate holdings per portfolio, asset, and date"
      qualifier: dup_holding
      query: |
        SELECT portfolio_id, asset_id, as_of_date
        FROM datasource.db.schema.portfolio_holdings
        GROUP BY portfolio_id, asset_id, as_of_date
        HAVING COUNT(*) > 1
      threshold:
        must_be: 0
  - failed_rows:
      name: "Holdings with zero quantity must have zero market value"
      qualifier: zero_qty_zero_mv
      expression

Column-level validation:

columns:
  - name: portfolio_id
    data_type: string
    checks:
      - missing:
          name: No missing values
      - invalid:
          name: "portfolio_id length guardrail"
          valid_min_length: 1
          valid_max_length: 64
  - name: asset_id
    data_type: string
    checks:
      - missing:
          name: No missing values
      - invalid:
          name: "asset_id length guardrail"
          valid_min_length: 1
          valid_max_length: 64
  - name: asset_type
    data_type: string
    checks:
      - missing:
          name: No missing values
      - invalid:
          name: "Allowed asset types"
          valid_values:
            - EQUITY
            - FIXED_INCOME
            - DERIVATIVE
            - FX
            - COMMODITY
            - CASH
            - FUND
            - ALTERNATIVE
  - name: quantity
    data_type: decimal
    checks:
      - missing:
          name: No missing values
      - invalid:
          name: "Quantity must be zero or positive"
          valid_min: 0
  - name: market_value
    data_type: decimal
    checks:
      - missing:
          name: No missing values
  - name: as_of_date
    data_type: date
    checks:
      - missing:
          name

Template #4: Fraud Detection ML Scores Data Contract

Consider this fourth failure scenario:

The fraud detection ML scores data contract prevents this scenario by enforcing score ranges, model traceability, and scoring freshness before predictions reach fraud-blocking systems.

Dataset-level protection:

variables:
  FRESHNESS_HOURS:
    default: 1

checks:
  - schema:
      allow_extra_columns: false
      allow_other_column_order: false
  - row_count:
      threshold:
        must_be_greater_than: 0
  - freshness:
      column: scored_at
      threshold:
        unit: hour
        must_be_less_than_or_equal: ${var.FRESHNESS_HOURS}
  - failed_rows:
      name: "scored_at must not be in the future"
      qualifier: scored_at_not_future
      expression: scored_at > ${soda.NOW}
  - failed_rows:
      name: "fraud_score must be between 0 and 1"
      qualifier: score_range
      expression: fraud_score < 0.0 OR fraud_score > 1.0
  - failed_rows:
      name: "No duplicate scores per transaction per model"
      qualifier: dup_txn_model
      query: |
        SELECT transaction_id, model_id
        FROM datasource.db.schema.fraud_scores
        GROUP BY transaction_id, model_id
        HAVING COUNT(*) > 1
      threshold:
        must_be: 0
  - failed_rows:
      name: "prediction_label must align with fraud_score threshold"
      qualifier: label_score_alignment
      expression: >
        (prediction_label = 'FRAUD' AND fraud_score < 0.5)
        OR (prediction_label = 'LEGIT' AND fraud_score > 0.9)

The duplicate scores per transaction per model check prevents double-scoring, which can happen when retry logic in streaming pipelines reprocesses the same transaction.

Column-level validation:

columns:
  - name: transaction_id
    data_type: string
    checks:
      - missing:
          name: No missing values
      - invalid:
          name: "transaction_id must be a UUID"
          valid_format:
            name: UUID
            regex: "^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}$"
  - name: model_id
    data_type: string
    checks:
      - missing:
          name: No missing values
      - invalid:
          name: "model_id length guardrail"
          valid_min_length: 1
          valid_max_length: 64
  - name: model_version
    data_type: string
    checks:
      - missing:
          name: No missing values
      - invalid:
          name: "Semantic version format (e.g. 1.2.3)"
          valid_format:
            name: semver
            regex: "^\\\\d+\\\\.\\\\d+\\\\.\\\\d+$"
  - name: fraud_score
    data_type: decimal
    checks:
      - missing:
          name: No missing values
      - invalid:
          name: "Score must be between 0 and 1"
          valid_min: 0.0
          valid_max: 1.0
  - name: prediction_label
    data_type: string
    checks:
      - missing:
          name: No missing values
      - invalid:
          name: "Allowed prediction labels"
          valid_values:
            - LEGIT
            - SUSPICIOUS
            - FRAUD
  - name: scored_at
    data_type: timestamp
    checks:
      - missing:
          name: No missing values
  - name: feature_hash
    data_type: string
    checks:
      - missing:
          name: No missing values
      - invalid:
          name: "feature_hash must be a valid SHA-256 hex string"
          valid_format:
            name: SHA-256 hash
            regex: "^[a-fA-F0-9]{64}$"

Template #5: BCBS 239 Regulatory Exposures Data Contract

The BCBS 239 regulation, for instance, makes data quality a regulatory requirement for banks to demonstrate that risk data is accurate, complete, timely, and reconcilable across systems.

Consider this last failure scenario:

The BCBS 239 data contract template prevents this scenario by enforcing referential integrity, format validation, and cross-system reconciliation before data reaches regulatory reporting systems.

Dataset-level protection:

checks:
  - schema: {}
  - failed_rows:
      name: counterparties_missing_in_exposures
      query: |
        SELECT m.counterparty_id
        FROM datasource.db.schema.counterparty_master m
        LEFT JOIN datasource.db.schema.exposures e
          ON e.counterparty_id = m.counterparty_id
        WHERE e.counterparty_id IS NULL
      threshold:
        must_be: 0
      attributes:
        bcbs239:
          - P4
        description: "Completeness: any counterparties missing from exposures?"

Column-level validation:

columns:
  - name: lei_code
    data_type: string
    checks:
      - invalid:
          name: lei_code_format
          valid_format:
            name: LEI must be 20 alphanumeric
            regex: '^[A-Z0-9]{20}$'
          attributes:
            bcbs239:
              - P3
            description: LEI format is 20 uppercase alphanumerics
  - name: counterparty_id
    checks:
      - invalid:
          name: counterparty_id_in_master
          valid_reference_data:
            dataset: datasource/db/schema/counterparty_master
            column: counterparty_id
          attributes:
            bcbs239:
              - P3
            description: "Integrity: all counterparties in exposures exist in the master"

Cross-system reconciliation:

reconciliation:
  source:
    dataset: datasource/db/schema/accounting_balances
  checks:
    - metric_diff:
        name: total_exposures_vs_total_balance
        source_expression: SUM(balance_amount)
        target_expression: SUM(exposure_amount)
        threshold:
          must_be_less_than: 1000
        attributes:
          bcbs239:
            - P3
            - P7
          description: "Reconciliation vs accounting: Overall tolerance across the books"

Note the use of attributes to tag each check with the specific BCBS 239 principles it addresses. This creates an audit trail that maps directly to supervisory expectations.

🟢 This contract ensures exposure reporting is complete, accurate, and reconciled with accounting data — the three pillars that regulators scrutinize most during BCBS 239 assessments.

Implementation with Soda

From Template to Production

Step 1: Treat Contracts as Code

→ Download templates from Soda's template gallery and customize them for your specific business rules.

Step 2: Verify Locally

Before deploying contracts to production, validate them against your actual data:

# pip install soda-{data source} for other data sources
pip install soda-snowflake

# verify the contract locally against a data source
soda contract verify -c contract.yml -ds

Step 3: Embed in Your Pipeline

Data contracts only protect you if they're enforced at the right checkpoints. Strategic placement determines which issues you catch early versus which slip through to cause downstream damage.

Where to enforce contracts in finance:

After ingestion from core banking systems — Validate transaction and balance data immediately after it arrives, catching issues while they're still close to the source
Before loading into the data warehouse — Create a quality gate that prevents corrupt data from entering your analytical environment
At API boundaries for payment and trading systems — Validate transaction data at the integration layer before it reaches internal ledgers
In ETL/ELT processes — Integrate with tools like dbt to validate data transformations at each stage
After ML model scoring — Validate fraud scores and predictions before they reach real-time decisioning engines

Step 4: Publish and Monitor

Once contracts are validated and embedded, publish them to Soda Cloud for continuous monitoring:

# publish and schedule the contract with Soda Cloud
soda contract publish -c contract.yml -sc

Book a demo to see how data contracts work in your specific environment with your actual data sources and business requirements.

AI-Powered Contract Generation

Don't want to write YAML from scratch? Soda's AI-powered contract autopilot can generate contracts automatically by analyzing your data structure and patterns.

One-click contract creation examines your tables and suggests appropriate checks based on data types, cardinality, and distribution patterns.

Start Small, Scale Systematically

Ownership and Accountability

For finance contracts, ownership typically aligns with business domains:

BCBS 239 Exposures contract → Risk & Regulatory Reporting team
Transaction Ledger contract → Finance Operations / Treasury
Account Balances contract → Core Banking / Back Office Operations
Portfolio Holdings contract → Investment / Asset Management team
Fraud Scores contract → Data Science / Fraud Analytics team

This domain-aligned ownership ensures the people who best understand the business rules and regulatory requirements are responsible for maintaining contract quality over time.

Evolving Contracts Over Time

Data contracts aren't static documents — they evolve as regulations change, new products launch, and risk models are updated.

When a new regulation introduces additional reporting fields, the contract update flows through the same PR review process as any code change — with full visibility into who approved what and when.

Conclusion: Beyond Validation to True Ownership

The real gain with data contracts isn't just validation — it's ownership and collaboration.

Ready to implement data contracts in your financial data pipelines?

See it in action: Book a demo to see how data contracts work in your specific environment with your actual data sources and business requirements.

Yet most data teams still operate in reactive mode, debugging reconciliation mismatches, explaining why exposure figures don't tie to accounting, and scrambling before supervisory deadlines.

In this guide, we'll show how to implement executable data contracts using five practical templates for financial data:

Each template includes real-world failure scenarios and executable Soda contract examples that you can use right away by changing the thresholds and value sets to fit your environment.

What Are Data Contracts in Finance?

When a contract fails, it can:

Block pipeline execution (CI/CD gate)
Prevent downstream table updates
Trigger alerting workflows
Mark datasets as invalid in observability tools

If you would like to learn more about the foundational structure of a data contract, check our guide: The Definitive Guide to Data Contracts

Why Financial Organizations Need Data Contracts

Financial institutions face a unique combination of constraints:

Regulatory pressure (BCBS 239, IFRS, SOX, internal audit)
Strict reconciliation requirements
High financial and reputational risk
Complex multi-system architectures
Increasing reliance on machine learning outputs

So we'd say there are four compelling reasons why data contracts are not just useful but essential for financial institutions:

First, they prevent silent data failures.

Data contracts act as check-points, stopping data if it violates agreed-upon schemas, types, or constraints.

For example, if a developer renames exposure_amount to exposure_value without coordinating with downstream teams, regulatory reporting logic may silently exclude the column of risk data.

Data contracts catch these breaking changes before they reach production, eliminating the kind of emergency that leads to supervisory scrutiny.

Second, they define clear ownership boundaries.

Every contract must have an assigned owner. So, when data quality issues arise, contracts point exactly to who fixes what.

Data contracts define the interoperability standards between domains, so that decentralization doesn’t become fragmentation.

Third, they enable safe change management.

Financial data management requires constant evolution to integrate new regulatory requirements, updated risk models, expanded product lines and evolving payment methods.

Data contracts enforce versioning, backward compatibility expectations, and approval workflows, ensuring changes are intentional and coordinated rather than accidental and chaotic.

Also, because contracts are version-controlled and reviewable, they create an auditable history of when validation rules changed and who approved them.

Fourth, they support compliance and governance requirements.

With regulations like BCBS 239, SOX, MiFID II, and PCI DSS, financial institutions must demonstrate that data is accurate, complete, timely, and reconcilable across systems.

🟢 The advantage of data contracts for both producers and consumers is clear: less downtime, fewer surprises, and less time spent fixing other people's changes.

Now, let's see how these contracts are structured and how you can apply them to your financial data.

Data Contract Templates for the Financial Sector

Template #1: Transaction Ledger Data Contract

Consider this failure scenario:

The transaction ledger data contract template prevents this scenario by enforcing schema stability, financial consistency rules, and controlled value sets before data reaches critical systems.

Dataset-level protection:

variables:
  FRESHNESS_HOURS:
    default: 24

checks:
  - schema:
      allow_extra_columns: false
      allow_other_column_order: false
  - row_count:
      threshold:
        must_be_greater_than: 0
  - freshness:
      column: transaction_timestamp
      threshold:
        unit: hour
        must_be_less_than_or_equal: ${var.FRESHNESS_HOURS}
  - failed_rows:
      name: "Transaction timestamp must not be in the future"
      qualifier: ts_not_future
      expression: transaction_timestamp > ${soda.NOW}
  - failed_rows:
      name: "Amount sign must match transaction_type (DEBIT negative, CREDIT positive)"
      qualifier: amount_sign_by_type
      expression: >
        (transaction_type = 'DEBIT' AND amount >= 0)
        OR (transaction_type = 'CREDIT' AND amount <= 0)

Column-level validation:

columns:
  - name: transaction_id
    data_type: varchar
    checks:
      - missing:
      - duplicate:
      - invalid:
          name: "transaction_id must be a UUID"
          valid_format:
            name: UUID
            regex: "^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}$"
  - name: account_id
    data_type: varchar
    checks:
      - missing:
      - invalid:
          name: "account_id must be non-empty and sane length"
          valid_min_length: 1
          valid_max_length: 64
  - name: customer_id
    data_type: varchar
    checks:
      - missing:
      - invalid:
          name: "customer_id must be non-empty and sane length"
          valid_min_length: 1
          valid_max_length: 64
  - name: transaction_timestamp
    data_type: timestamp
    checks:
      - missing:
  - name: amount
    data_type: decimal
    checks:
      - missing:
      - invalid:
          name: "Amount must not be zero"
          invalid_values: [0]
          threshold:
            metric: percent
            must_be_less_than: 0.1
  - name: currency
    data_type: varchar
    checks:
      - missing:
      - invalid:
          name: "Currency must be ISO-4217-like (3 uppercase letters)"
          valid_format:
            name: ISO-4217 code
            regex: "^[A-Z]{3}$"
  - name: transaction_type
    data_type: varchar
    checks:
      - missing:
      - invalid:
          name: "Allowed transaction types"
          valid_values:
            - DEBIT
            - CREDIT
            - TRANSFER
            - FEE
            - REVERSAL
            - ADJUSTMENT
  - name: status
    data_type: varchar
    checks:
      - missing:
      - invalid:
          name: "Allowed statuses"
          valid_values:
            - PENDING
            - POSTED
            - REVERSED
            - FAILED
            - CANCELLED
  - name: reference_id
    data_type: varchar
    checks:
      - invalid:
          name: "reference_id length guardrail"
          valid_max_length: 128

Template #2: Account Balances Data Contract

Consider this second failure scenario:

The account balances data contract prevents this scenario by enforcing temporal consistency, cross-day continuity, and financial integrity before data reaches critical systems.

Dataset-level protection:

variables:
  FRESHNESS_HOURS:
    default: 24
  RECONCILIATION_TOLERANCE:
    default: 0.01

checks:
  - schema:
      allow_extra_columns: false
      allow_other_column_order: false
  - row_count:
      threshold:
        must_be_greater_than: 0
  - freshness:
      column: balance_date
      threshold:
        unit: hour
        must_be_less_than_or_equal: ${var.FRESHNESS_HOURS}
  - failed_rows:
      name: "balance_date must not be in the future"
      qualifier: balance_date_not_future
      expression: balance_date > CURRENT_DATE
  - failed_rows:
      name: "closing_balance must not be negative for standard accounts"
      qualifier: closing_non_negative
      expression: closing_balance < 0
  - failed_rows:
      name: "Duplicate balance per account per date"
      qualifier: dup_account_date
      query: |
        SELECT account_id, balance_date
        FROM datasource.db.schema.account_balances
        GROUP BY account_id, balance_date
        HAVING COUNT(*) > 1
      threshold:
        must_be: 0
  - failed_rows:
      name: "Opening balance must equal previous day closing balance"
      qualifier: opening_eq_prev_closing
      query: |
        SELECT curr.account_id, curr.balance_date
        FROM datasource.db.schema.account_balances curr
        JOIN datasource.db.schema.account_balances prev
          ON curr.account_id = prev.account_id
          AND curr.balance_date = DATEADD('day', 1, prev.balance_date)
        WHERE ABS(curr.opening_balance - prev.closing_balance) > ${var.RECONCILIATION_TOLERANCE}
      threshold:
        must_be: 0

Note the use of variables for both freshness and reconciliation tolerance. This allows teams to easily tune thresholds per environment without editing the core contract logic.

Column-level validation:

columns:
  - name: account_id
    data_type: string
    checks:
      - missing:
          name: No missing values
      - invalid:
          name: "account_id length guardrail"
          valid_min_length: 1
          valid_max_length: 64
  - name: balance_date
    data_type: date
    checks:
      - missing:
          name: No missing values
  - name: opening_balance
    data_type: decimal
    checks:
      - missing:
          name: No missing values
  - name: closing_balance
    data_type: decimal
    checks:
      - missing:
          name: No missing values
  - name: currency
    data_type: string
    checks:
      - missing:
          name: No missing values
      - invalid:
          name: "Currency must be ISO-4217 (3 uppercase letters)"
          valid_format:
            name: ISO-4217 code
            regex: "^[A-Z]{3}$"

Template #3: Portfolio Holdings Data Contract

Consider another failure scenario:

Dataset-level protection:

variables:
  FRESHNESS_HOURS:
    default: 24

checks:
  - schema:
      allow_extra_columns: false
      allow_other_column_order: false
  - row_count:
      threshold:
        must_be_greater_than: 0
  - freshness:
      column: as_of_date
      threshold:
        unit: hour
        must_be_less_than_or_equal: ${var.FRESHNESS_HOURS}
  - failed_rows:
      name: "as_of_date must not be in the future"
      qualifier: as_of_date_not_future
      expression: as_of_date > CURRENT_DATE
  - failed_rows:
      name: "market_value must be non-negative"
      qualifier: market_value_non_negative
      expression: market_value < 0
  - failed_rows:
      name: "quantity must be non-negative"
      qualifier: quantity_non_negative
      expression: quantity < 0
  - failed_rows:
      name: "No duplicate holdings per portfolio, asset, and date"
      qualifier: dup_holding
      query: |
        SELECT portfolio_id, asset_id, as_of_date
        FROM datasource.db.schema.portfolio_holdings
        GROUP BY portfolio_id, asset_id, as_of_date
        HAVING COUNT(*) > 1
      threshold:
        must_be: 0
  - failed_rows:
      name: "Holdings with zero quantity must have zero market value"
      qualifier: zero_qty_zero_mv
      expression

Column-level validation:

columns:
  - name: portfolio_id
    data_type: string
    checks:
      - missing:
          name: No missing values
      - invalid:
          name: "portfolio_id length guardrail"
          valid_min_length: 1
          valid_max_length: 64
  - name: asset_id
    data_type: string
    checks:
      - missing:
          name: No missing values
      - invalid:
          name: "asset_id length guardrail"
          valid_min_length: 1
          valid_max_length: 64
  - name: asset_type
    data_type: string
    checks:
      - missing:
          name: No missing values
      - invalid:
          name: "Allowed asset types"
          valid_values:
            - EQUITY
            - FIXED_INCOME
            - DERIVATIVE
            - FX
            - COMMODITY
            - CASH
            - FUND
            - ALTERNATIVE
  - name: quantity
    data_type: decimal
    checks:
      - missing:
          name: No missing values
      - invalid:
          name: "Quantity must be zero or positive"
          valid_min: 0
  - name: market_value
    data_type: decimal
    checks:
      - missing:
          name: No missing values
  - name: as_of_date
    data_type: date
    checks:
      - missing:
          name

Template #4: Fraud Detection ML Scores Data Contract

Consider this fourth failure scenario:

The fraud detection ML scores data contract prevents this scenario by enforcing score ranges, model traceability, and scoring freshness before predictions reach fraud-blocking systems.

Dataset-level protection:

variables:
  FRESHNESS_HOURS:
    default: 1

checks:
  - schema:
      allow_extra_columns: false
      allow_other_column_order: false
  - row_count:
      threshold:
        must_be_greater_than: 0
  - freshness:
      column: scored_at
      threshold:
        unit: hour
        must_be_less_than_or_equal: ${var.FRESHNESS_HOURS}
  - failed_rows:
      name: "scored_at must not be in the future"
      qualifier: scored_at_not_future
      expression: scored_at > ${soda.NOW}
  - failed_rows:
      name: "fraud_score must be between 0 and 1"
      qualifier: score_range
      expression: fraud_score < 0.0 OR fraud_score > 1.0
  - failed_rows:
      name: "No duplicate scores per transaction per model"
      qualifier: dup_txn_model
      query: |
        SELECT transaction_id, model_id
        FROM datasource.db.schema.fraud_scores
        GROUP BY transaction_id, model_id
        HAVING COUNT(*) > 1
      threshold:
        must_be: 0
  - failed_rows:
      name: "prediction_label must align with fraud_score threshold"
      qualifier: label_score_alignment
      expression: >
        (prediction_label = 'FRAUD' AND fraud_score < 0.5)
        OR (prediction_label = 'LEGIT' AND fraud_score > 0.9)

The duplicate scores per transaction per model check prevents double-scoring, which can happen when retry logic in streaming pipelines reprocesses the same transaction.

Column-level validation:

columns:
  - name: transaction_id
    data_type: string
    checks:
      - missing:
          name: No missing values
      - invalid:
          name: "transaction_id must be a UUID"
          valid_format:
            name: UUID
            regex: "^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}$"
  - name: model_id
    data_type: string
    checks:
      - missing:
          name: No missing values
      - invalid:
          name: "model_id length guardrail"
          valid_min_length: 1
          valid_max_length: 64
  - name: model_version
    data_type: string
    checks:
      - missing:
          name: No missing values
      - invalid:
          name: "Semantic version format (e.g. 1.2.3)"
          valid_format:
            name: semver
            regex: "^\\\\d+\\\\.\\\\d+\\\\.\\\\d+$"
  - name: fraud_score
    data_type: decimal
    checks:
      - missing:
          name: No missing values
      - invalid:
          name: "Score must be between 0 and 1"
          valid_min: 0.0
          valid_max: 1.0
  - name: prediction_label
    data_type: string
    checks:
      - missing:
          name: No missing values
      - invalid:
          name: "Allowed prediction labels"
          valid_values:
            - LEGIT
            - SUSPICIOUS
            - FRAUD
  - name: scored_at
    data_type: timestamp
    checks:
      - missing:
          name: No missing values
  - name: feature_hash
    data_type: string
    checks:
      - missing:
          name: No missing values
      - invalid:
          name: "feature_hash must be a valid SHA-256 hex string"
          valid_format:
            name: SHA-256 hash
            regex: "^[a-fA-F0-9]{64}$"

Template #5: BCBS 239 Regulatory Exposures Data Contract

The BCBS 239 regulation, for instance, makes data quality a regulatory requirement for banks to demonstrate that risk data is accurate, complete, timely, and reconcilable across systems.

Consider this last failure scenario:

The BCBS 239 data contract template prevents this scenario by enforcing referential integrity, format validation, and cross-system reconciliation before data reaches regulatory reporting systems.

Dataset-level protection:

checks:
  - schema: {}
  - failed_rows:
      name: counterparties_missing_in_exposures
      query: |
        SELECT m.counterparty_id
        FROM datasource.db.schema.counterparty_master m
        LEFT JOIN datasource.db.schema.exposures e
          ON e.counterparty_id = m.counterparty_id
        WHERE e.counterparty_id IS NULL
      threshold:
        must_be: 0
      attributes:
        bcbs239:
          - P4
        description: "Completeness: any counterparties missing from exposures?"

Column-level validation:

columns:
  - name: lei_code
    data_type: string
    checks:
      - invalid:
          name: lei_code_format
          valid_format:
            name: LEI must be 20 alphanumeric
            regex: '^[A-Z0-9]{20}$'
          attributes:
            bcbs239:
              - P3
            description: LEI format is 20 uppercase alphanumerics
  - name: counterparty_id
    checks:
      - invalid:
          name: counterparty_id_in_master
          valid_reference_data:
            dataset: datasource/db/schema/counterparty_master
            column: counterparty_id
          attributes:
            bcbs239:
              - P3
            description: "Integrity: all counterparties in exposures exist in the master"

Cross-system reconciliation:

reconciliation:
  source:
    dataset: datasource/db/schema/accounting_balances
  checks:
    - metric_diff:
        name: total_exposures_vs_total_balance
        source_expression: SUM(balance_amount)
        target_expression: SUM(exposure_amount)
        threshold:
          must_be_less_than: 1000
        attributes:
          bcbs239:
            - P3
            - P7
          description: "Reconciliation vs accounting: Overall tolerance across the books"

Note the use of attributes to tag each check with the specific BCBS 239 principles it addresses. This creates an audit trail that maps directly to supervisory expectations.

🟢 This contract ensures exposure reporting is complete, accurate, and reconciled with accounting data — the three pillars that regulators scrutinize most during BCBS 239 assessments.

Implementation with Soda

From Template to Production

Step 1: Treat Contracts as Code

→ Download templates from Soda's template gallery and customize them for your specific business rules.

Step 2: Verify Locally

Before deploying contracts to production, validate them against your actual data:

# pip install soda-{data source} for other data sources
pip install soda-snowflake

# verify the contract locally against a data source
soda contract verify -c contract.yml -ds

Step 3: Embed in Your Pipeline

Data contracts only protect you if they're enforced at the right checkpoints. Strategic placement determines which issues you catch early versus which slip through to cause downstream damage.

Where to enforce contracts in finance:

After ingestion from core banking systems — Validate transaction and balance data immediately after it arrives, catching issues while they're still close to the source
Before loading into the data warehouse — Create a quality gate that prevents corrupt data from entering your analytical environment
At API boundaries for payment and trading systems — Validate transaction data at the integration layer before it reaches internal ledgers
In ETL/ELT processes — Integrate with tools like dbt to validate data transformations at each stage
After ML model scoring — Validate fraud scores and predictions before they reach real-time decisioning engines

Step 4: Publish and Monitor

Once contracts are validated and embedded, publish them to Soda Cloud for continuous monitoring:

# publish and schedule the contract with Soda Cloud
soda contract publish -c contract.yml -sc

Book a demo to see how data contracts work in your specific environment with your actual data sources and business requirements.

AI-Powered Contract Generation

Don't want to write YAML from scratch? Soda's AI-powered contract autopilot can generate contracts automatically by analyzing your data structure and patterns.

One-click contract creation examines your tables and suggests appropriate checks based on data types, cardinality, and distribution patterns.

Start Small, Scale Systematically

Ownership and Accountability

For finance contracts, ownership typically aligns with business domains:

BCBS 239 Exposures contract → Risk & Regulatory Reporting team
Transaction Ledger contract → Finance Operations / Treasury
Account Balances contract → Core Banking / Back Office Operations
Portfolio Holdings contract → Investment / Asset Management team
Fraud Scores contract → Data Science / Fraud Analytics team

This domain-aligned ownership ensures the people who best understand the business rules and regulatory requirements are responsible for maintaining contract quality over time.

Evolving Contracts Over Time

Data contracts aren't static documents — they evolve as regulations change, new products launch, and risk models are updated.

When a new regulation introduces additional reporting fields, the contract update flows through the same PR review process as any code change — with full visibility into who approved what and when.

Conclusion: Beyond Validation to True Ownership

The real gain with data contracts isn't just validation — it's ownership and collaboration.

Ready to implement data contracts in your financial data pipelines?

See it in action: Book a demo to see how data contracts work in your specific environment with your actual data sources and business requirements.

Product

Solutions

Pricing

Templates

Blog

Book a demo

Case studies

Trusted by the world’s leading enterprises

Real stories from companies using Soda to keep their data reliable, accurate, and ready for action.

At the end of the day, we don’t want to be in there managing the checks, updating the checks, adding the checks. We just want to go and observe what’s happening, and that’s what Soda is enabling right now.

Sid Srivastava

Director of Data Governance, Quality and MLOps

Read the story

Investing in data quality is key for cross-functional teams to make accurate, complete decisions with fewer risks and greater returns, using initiatives such as product thinking, data governance, and self-service platforms.

Mario Konschake

Director of Product-Data Platform

Read the story

Soda has integrated seamlessly into our technology stack and given us the confidence to find, analyze, implement, and resolve data issues through a simple self-serve capability.

Sutaraj Dutta

Data Engineering Manager

Read the story

Our goal was to deliver high-quality datasets in near real-time, ensuring dashboards reflect live data as it flows in. But beyond solving technical challenges, we wanted to spark a cultural shift - empowering the entire organization to make decisions grounded in accurate, timely data.

Gu Xie

Head of Data Engineering

Read the story

4.4 of 5

Start trusting your data. Today.

Find, understand, and fix any data quality issue in seconds.
From table to record-level.

Book a demo

Trusted by

Terms & Conditions

Case studies

Trusted by the world’s leading enterprises

Real stories from companies using Soda to keep their data reliable, accurate, and ready for action.

Sid Srivastava

Director of Data Governance, Quality and MLOps

Read the story

Mario Konschake

Director of Product-Data Platform

Read the story

Soda has integrated seamlessly into our technology stack and given us the confidence to find, analyze, implement, and resolve data issues through a simple self-serve capability.

Sutaraj Dutta

Data Engineering Manager

Read the story

Gu Xie

Head of Data Engineering

Read the story

4.4 of 5

Start trusting your data. Today.

Find, understand, and fix any data quality issue in seconds.
From table to record-level.

Book a demo

Trusted by

Terms & Conditions

Case studies

Trusted by the world’s leading enterprises

Real stories from companies using Soda to keep their data reliable, accurate, and ready for action.

Sid Srivastava

Director of Data Governance, Quality and MLOps

Read the story

Mario Konschake

Director of Product-Data Platform

Read the story

Soda has integrated seamlessly into our technology stack and given us the confidence to find, analyze, implement, and resolve data issues through a simple self-serve capability.

Sutaraj Dutta

Data Engineering Manager

Read the story

Gu Xie

Head of Data Engineering

Read the story

4.4 of 5

Start trusting your data. Today.

Find, understand, and fix any data quality issue in seconds.
From table to record-level.

Book a demo

Trusted by

Terms & Conditions