Customer Master

Customer Master

Data Contract Template

Data Contract Template

Ensure Customer data is complete, consistent, and reliable before its used to join orders and sales for analytics, segmentation, and lifecycle reporting.

Data contract description

This data contract enforces schema stability and core master-data integrity rules for the customers dataset. It ensures the dataset is not empty, prevents duplicate or missing customer identifiers, validates email format quality within an acceptable tolerance, and enforces uniqueness of email addresses when present. It also restricts country codes to a valid two-letter ISO format and limits customer status to approved lifecycle values. Together, these checks protect identity consistency, prevent duplicate customer profiles, and ensure reliable joins across orders and sales for analytics, segmentation, and reporting.

customer_master_data_contract.yaml

dataset
checks:
  - schema:
      allow_extra_columns: false
      allow_other_column_order: false
  - row_count:
      threshold:
        must_be_greater_than: 0
  - failed_rows:
      name: "Email should be unique (if present)"
      qualifier: email_unique
      query: |
        SELECT email
        FROM customers
        WHERE email IS NOT NULL AND email <> ''
        GROUP BY email
        HAVING COUNT(*) > 1
      threshold:
        must_be: 0
columns:
  - name: customer_id
    data_type: string
    checks:
      - missing:
      - duplicate:
      - invalid:
          name: "customer_id length guardrail"
          valid_min_length: 1
          valid_max_length: 64
  - name: email
    data_type: string
    checks:
      - invalid:
          name: "email format (basic)"
          valid_format:
            name: Email pattern
            regex: "^[^@\\s]+@[^@\\s]+\\.[^@\\s]+$"
          threshold:
            metric: percent
            must_be_less_than: 0.5
  - name: country_code
    data_type: string
    checks:
      - missing:
      - invalid:
          name: "Two-letter country code"
          valid_format:
            name: ISO-3166 alpha-2
            regex: "^[A-Z]{2}$"
  - name: status
    data_type: string
    checks:
      - missing:
      - invalid:
          name: "Allowed customer statuses"
          valid_values

Data contract description

This data contract enforces schema stability and core master-data integrity rules for the customers dataset. It ensures the dataset is not empty, prevents duplicate or missing customer identifiers, validates email format quality within an acceptable tolerance, and enforces uniqueness of email addresses when present. It also restricts country codes to a valid two-letter ISO format and limits customer status to approved lifecycle values. Together, these checks protect identity consistency, prevent duplicate customer profiles, and ensure reliable joins across orders and sales for analytics, segmentation, and reporting.

customer_master_data_contract.yaml

dataset
checks:
  - schema:
      allow_extra_columns: false
      allow_other_column_order: false
  - row_count:
      threshold:
        must_be_greater_than: 0
  - failed_rows:
      name: "Email should be unique (if present)"
      qualifier: email_unique
      query: |
        SELECT email
        FROM customers
        WHERE email IS NOT NULL AND email <> ''
        GROUP BY email
        HAVING COUNT(*) > 1
      threshold:
        must_be: 0
columns:
  - name: customer_id
    data_type: string
    checks:
      - missing:
      - duplicate:
      - invalid:
          name: "customer_id length guardrail"
          valid_min_length: 1
          valid_max_length: 64
  - name: email
    data_type: string
    checks:
      - invalid:
          name: "email format (basic)"
          valid_format:
            name: Email pattern
            regex: "^[^@\\s]+@[^@\\s]+\\.[^@\\s]+$"
          threshold:
            metric: percent
            must_be_less_than: 0.5
  - name: country_code
    data_type: string
    checks:
      - missing:
      - invalid:
          name: "Two-letter country code"
          valid_format:
            name: ISO-3166 alpha-2
            regex: "^[A-Z]{2}$"
  - name: status
    data_type: string
    checks:
      - missing:
      - invalid:
          name: "Allowed customer statuses"
          valid_values

Data contract description

This data contract enforces schema stability and core master-data integrity rules for the customers dataset. It ensures the dataset is not empty, prevents duplicate or missing customer identifiers, validates email format quality within an acceptable tolerance, and enforces uniqueness of email addresses when present. It also restricts country codes to a valid two-letter ISO format and limits customer status to approved lifecycle values. Together, these checks protect identity consistency, prevent duplicate customer profiles, and ensure reliable joins across orders and sales for analytics, segmentation, and reporting.

customer_master_data_contract.yaml

dataset
checks:
  - schema:
      allow_extra_columns: false
      allow_other_column_order: false
  - row_count:
      threshold:
        must_be_greater_than: 0
  - failed_rows:
      name: "Email should be unique (if present)"
      qualifier: email_unique
      query: |
        SELECT email
        FROM customers
        WHERE email IS NOT NULL AND email <> ''
        GROUP BY email
        HAVING COUNT(*) > 1
      threshold:
        must_be: 0
columns:
  - name: customer_id
    data_type: string
    checks:
      - missing:
      - duplicate:
      - invalid:
          name: "customer_id length guardrail"
          valid_min_length: 1
          valid_max_length: 64
  - name: email
    data_type: string
    checks:
      - invalid:
          name: "email format (basic)"
          valid_format:
            name: Email pattern
            regex: "^[^@\\s]+@[^@\\s]+\\.[^@\\s]+$"
          threshold:
            metric: percent
            must_be_less_than: 0.5
  - name: country_code
    data_type: string
    checks:
      - missing:
      - invalid:
          name: "Two-letter country code"
          valid_format:
            name: ISO-3166 alpha-2
            regex: "^[A-Z]{2}$"
  - name: status
    data_type: string
    checks:
      - missing:
      - invalid:
          name: "Allowed customer statuses"
          valid_values

How to Enforce Data Contracts with Soda

Embed data quality through data contracts at any point in your pipeline.

Embed data quality through data contracts at any point in your pipeline.

# pip install soda-{data source} for other data sources

# pip install soda-{data source} for other data sources

pip install soda-postgres

pip install soda-postgres

# verify the contract locally against a data source

# verify the contract locally against a data source

soda contract verify -c contract.yml -ds ds_config.yml

soda contract verify -c contract.yml -ds ds_config.yml

# publish and schedule the contract with Soda Cloud

# publish and schedule the contract with Soda Cloud

soda contract publish -c contract.yml -sc sc_config.yml

soda contract publish -c contract.yml -sc sc_config.yml

Check out the CLI documentation to learn more.

Check out the CLI documentation to learn more.

How to Automatically Create Data Contracts.
In one Click.

Automatically write and publish data contracts using Soda's AI-powered data contract copilot.

Make data contracts work in production

Business knows what good data looks like. Engineering knows how to deliver it at scale. Soda unites both, turning governance expectations into executable contracts.

Make data contracts work in production

Business knows what good data looks like. Engineering knows how to deliver it at scale. Soda unites both, turning governance expectations into executable contracts.

Make data contracts work in production

Business knows what good data looks like. Engineering knows how to deliver it at scale. Soda unites both, turning governance expectations into executable contracts.

4.4 of 5

Start trusting your data. Today.

Find, understand, and fix any data quality issue in seconds.
From table to record-level.

Trusted by

4.4 of 5

Start trusting your data. Today.

Find, understand, and fix any data quality issue in seconds.
From table to record-level.

Trusted by

4.4 of 5

Start trusting your data. Today.

Find, understand, and fix any data quality issue in seconds.
From table to record-level.

Trusted by