Customer Master

Customer Master

Assurez une cohérence irréprochable entre tous vos produits de données

Assurez une cohérence irréprochable entre tous vos produits de données

Ensure Customer data is complete, consistent, and reliable before its used to join orders and sales for analytics, segmentation, and lifecycle reporting.

Data contract description

This data contract enforces schema stability and core master-data integrity rules for the customers dataset. It ensures the dataset is not empty, prevents duplicate or missing customer identifiers, validates email format quality within an acceptable tolerance, and enforces uniqueness of email addresses when present. It also restricts country codes to a valid two-letter ISO format and limits customer status to approved lifecycle values. Together, these checks protect identity consistency, prevent duplicate customer profiles, and ensure reliable joins across orders and sales for analytics, segmentation, and reporting.

customer_master_data_contract.yaml

dataset
checks:
  - schema:
      allow_extra_columns: false
      allow_other_column_order: false
  - row_count:
      threshold:
        must_be_greater_than: 0
  - failed_rows:
      name: "Email should be unique (if present)"
      qualifier: email_unique
      query: |
        SELECT email
        FROM customers
        WHERE email IS NOT NULL AND email <> ''
        GROUP BY email
        HAVING COUNT(*) > 1
      threshold:
        must_be: 0
columns:
  - name: customer_id
    data_type: string
    checks:
      - missing:
      - duplicate:
      - invalid:
          name: "customer_id length guardrail"
          valid_min_length: 1
          valid_max_length: 64
  - name: email
    data_type: string
    checks:
      - invalid:
          name: "email format (basic)"
          valid_format:
            name: Email pattern
            regex: "^[^@\\s]+@[^@\\s]+\\.[^@\\s]+$"
          threshold:
            metric: percent
            must_be_less_than: 0.5
  - name: country_code
    data_type: string
    checks:
      - missing:
      - invalid:
          name: "Two-letter country code"
          valid_format:
            name: ISO-3166 alpha-2
            regex: "^[A-Z]{2}$"
  - name: status
    data_type: string
    checks:
      - missing:
      - invalid:
          name: "Allowed customer statuses"
          valid_values

Data contract description

This data contract enforces schema stability and core master-data integrity rules for the customers dataset. It ensures the dataset is not empty, prevents duplicate or missing customer identifiers, validates email format quality within an acceptable tolerance, and enforces uniqueness of email addresses when present. It also restricts country codes to a valid two-letter ISO format and limits customer status to approved lifecycle values. Together, these checks protect identity consistency, prevent duplicate customer profiles, and ensure reliable joins across orders and sales for analytics, segmentation, and reporting.

customer_master_data_contract.yaml

dataset
checks:
  - schema:
      allow_extra_columns: false
      allow_other_column_order: false
  - row_count:
      threshold:
        must_be_greater_than: 0
  - failed_rows:
      name: "Email should be unique (if present)"
      qualifier: email_unique
      query: |
        SELECT email
        FROM customers
        WHERE email IS NOT NULL AND email <> ''
        GROUP BY email
        HAVING COUNT(*) > 1
      threshold:
        must_be: 0
columns:
  - name: customer_id
    data_type: string
    checks:
      - missing:
      - duplicate:
      - invalid:
          name: "customer_id length guardrail"
          valid_min_length: 1
          valid_max_length: 64
  - name: email
    data_type: string
    checks:
      - invalid:
          name: "email format (basic)"
          valid_format:
            name: Email pattern
            regex: "^[^@\\s]+@[^@\\s]+\\.[^@\\s]+$"
          threshold:
            metric: percent
            must_be_less_than: 0.5
  - name: country_code
    data_type: string
    checks:
      - missing:
      - invalid:
          name: "Two-letter country code"
          valid_format:
            name: ISO-3166 alpha-2
            regex: "^[A-Z]{2}$"
  - name: status
    data_type: string
    checks:
      - missing:
      - invalid:
          name: "Allowed customer statuses"
          valid_values

Data contract description

This data contract enforces schema stability and core master-data integrity rules for the customers dataset. It ensures the dataset is not empty, prevents duplicate or missing customer identifiers, validates email format quality within an acceptable tolerance, and enforces uniqueness of email addresses when present. It also restricts country codes to a valid two-letter ISO format and limits customer status to approved lifecycle values. Together, these checks protect identity consistency, prevent duplicate customer profiles, and ensure reliable joins across orders and sales for analytics, segmentation, and reporting.

customer_master_data_contract.yaml

dataset
checks:
  - schema:
      allow_extra_columns: false
      allow_other_column_order: false
  - row_count:
      threshold:
        must_be_greater_than: 0
  - failed_rows:
      name: "Email should be unique (if present)"
      qualifier: email_unique
      query: |
        SELECT email
        FROM customers
        WHERE email IS NOT NULL AND email <> ''
        GROUP BY email
        HAVING COUNT(*) > 1
      threshold:
        must_be: 0
columns:
  - name: customer_id
    data_type: string
    checks:
      - missing:
      - duplicate:
      - invalid:
          name: "customer_id length guardrail"
          valid_min_length: 1
          valid_max_length: 64
  - name: email
    data_type: string
    checks:
      - invalid:
          name: "email format (basic)"
          valid_format:
            name: Email pattern
            regex: "^[^@\\s]+@[^@\\s]+\\.[^@\\s]+$"
          threshold:
            metric: percent
            must_be_less_than: 0.5
  - name: country_code
    data_type: string
    checks:
      - missing:
      - invalid:
          name: "Two-letter country code"
          valid_format:
            name: ISO-3166 alpha-2
            regex: "^[A-Z]{2}$"
  - name: status
    data_type: string
    checks:
      - missing:
      - invalid:
          name: "Allowed customer statuses"
          valid_values

How to Enforce Data Contracts with Soda

Embed data quality through data contracts at any point in your pipeline.

Embed data quality through data contracts at any point in your pipeline.

# pip install soda-{data source} for other data sources

# pip install soda-{data source} for other data sources

pip install soda-postgres

pip install soda-postgres

# verify the contract locally against a data source

# verify the contract locally against a data source

soda contract verify -c contract.yml -ds ds_config.yml

soda contract verify -c contract.yml -ds ds_config.yml

# publish and schedule the contract with Soda Cloud

# publish and schedule the contract with Soda Cloud

soda contract publish -c contract.yml -sc sc_config.yml

soda contract publish -c contract.yml -sc sc_config.yml

Check out the CLI documentation to learn more.

Check out the CLI documentation to learn more.

How to Automatically Create Data Contracts.
In one Click.

Automatically write and publish data contracts using Soda's AI-powered data contract copilot.

Qualité des données IA basée sur la recherche

Nos recherches ont été publiées dans des revues et conférences de renom, telles que NeurIPs, JAIR et ACML. Les mêmes lieux qui ont fait progresser les fondations de GPT et de l'IA moderne.

Qualité des données IA basée sur la recherche

Nos recherches ont été publiées dans des revues et conférences de renom, telles que NeurIPs, JAIR et ACML. Les mêmes lieux qui ont fait progresser les fondations de GPT et de l'IA moderne.

Qualité des données IA basée sur la recherche

Nos recherches ont été publiées dans des revues et conférences de renom, telles que NeurIPs, JAIR et ACML. Les mêmes lieux qui ont fait progresser les fondations de GPT et de l'IA moderne.

4,4 sur 5

Commencez à faire confiance à vos données. Aujourd'hui.

Trouvez, comprenez et corrigez tout problème de qualité des données en quelques secondes.
Du niveau de la table au niveau des enregistrements.

Adopté par

4,4 sur 5

Commencez à faire confiance à vos données. Aujourd'hui.

Trouvez, comprenez et corrigez tout problème de qualité des données en quelques secondes.
Du niveau de la table au niveau des enregistrements.

Adopté par

4,4 sur 5

Commencez à faire confiance à vos données. Aujourd'hui.

Trouvez, comprenez et corrigez tout problème de qualité des données en quelques secondes.
Du niveau de la table au niveau des enregistrements.

Adopté par