Product Catalog

Product Catalog

Assurez une cohérence irréprochable entre tous vos produits de données

Assurez une cohérence irréprochable entre tous vos produits de données

Ensure Product Catalog data is complete, consistent, and reliable before its used for pricing, merchandising, inventory management, and sales reporting.

Data contract description

This data contract enforces schema stability and required product identifiers and attributes to maintain a reliable master catalog. It prevents missing or duplicate product IDs and SKUs, validates SKU formatting, blocks negative list prices, and restricts product status to approved lifecycle values. Together, these checks prevent broken joins with sales and inventory data, reduce pricing inconsistencies, and ensure merchandising, reporting, and downstream operational processes rely on clean, consistent product master data.

product_catalog_data_contract.yaml

dataset
checks:
  - schema:
      allow_extra_columns: false
      allow_other_column_order: false
  - row_count:
      threshold:
        must_be_greater_than: 0

  # Dataset integrity rules
  - failed_rows:
      name: "List price must not be negative"
      qualifier: list_price_non_negative
      expression: list_price < 0
  - failed_rows:
      name: "SKU should be unique"
      qualifier: sku_unique
      query: |
        SELECT sku
        FROM datasource.database.schema.products
        WHERE sku IS NOT NULL
        GROUP BY sku
        HAVING COUNT(*) > 1
      threshold:
        must_be: 0
      description: "Prevents duplicate SKUs that break catalog joins and reporting."
columns:
  - name: product_id
    data_type: string
    checks:
      - missing:
      - duplicate:
      - invalid:
          name: "product_id length guardrail"
          valid_min_length: 1
          valid_max_length: 64
  - name: sku
    data_type: string
    checks:
      - missing:
      - invalid:
          name: "sku must be uppercase letters/numbers with separators"
          valid_format:
            name: SKU format
            regex: "^[A-Z0-9][A-Z0-9_-]{0,63}$"
  - name: product_name
    data_type: string
    checks:
      - missing:
      - invalid:
          name: "product_name length guardrail"
          valid_min_length: 2
          valid_max_length: 255
  - name: category
    data_type: string
    checks:
      - missing:
      - invalid:
          name: "category length guardrail"
          valid_min_length: 2
          valid_max_length: 100
  - name: list_price
    data_type: float
    checks:
      - missing:
      - invalid:
          name: "list_price must be zero or positive"
          valid_min: 0
  - name: status
    data_type: string
    checks:
      - missing:
      - invalid:
          name: "Allowed product statuses"
          valid_values

Data contract description

This data contract enforces schema stability and required product identifiers and attributes to maintain a reliable master catalog. It prevents missing or duplicate product IDs and SKUs, validates SKU formatting, blocks negative list prices, and restricts product status to approved lifecycle values. Together, these checks prevent broken joins with sales and inventory data, reduce pricing inconsistencies, and ensure merchandising, reporting, and downstream operational processes rely on clean, consistent product master data.

product_catalog_data_contract.yaml

dataset
checks:
  - schema:
      allow_extra_columns: false
      allow_other_column_order: false
  - row_count:
      threshold:
        must_be_greater_than: 0

  # Dataset integrity rules
  - failed_rows:
      name: "List price must not be negative"
      qualifier: list_price_non_negative
      expression: list_price < 0
  - failed_rows:
      name: "SKU should be unique"
      qualifier: sku_unique
      query: |
        SELECT sku
        FROM datasource.database.schema.products
        WHERE sku IS NOT NULL
        GROUP BY sku
        HAVING COUNT(*) > 1
      threshold:
        must_be: 0
      description: "Prevents duplicate SKUs that break catalog joins and reporting."
columns:
  - name: product_id
    data_type: string
    checks:
      - missing:
      - duplicate:
      - invalid:
          name: "product_id length guardrail"
          valid_min_length: 1
          valid_max_length: 64
  - name: sku
    data_type: string
    checks:
      - missing:
      - invalid:
          name: "sku must be uppercase letters/numbers with separators"
          valid_format:
            name: SKU format
            regex: "^[A-Z0-9][A-Z0-9_-]{0,63}$"
  - name: product_name
    data_type: string
    checks:
      - missing:
      - invalid:
          name: "product_name length guardrail"
          valid_min_length: 2
          valid_max_length: 255
  - name: category
    data_type: string
    checks:
      - missing:
      - invalid:
          name: "category length guardrail"
          valid_min_length: 2
          valid_max_length: 100
  - name: list_price
    data_type: float
    checks:
      - missing:
      - invalid:
          name: "list_price must be zero or positive"
          valid_min: 0
  - name: status
    data_type: string
    checks:
      - missing:
      - invalid:
          name: "Allowed product statuses"
          valid_values

Data contract description

This data contract enforces schema stability and required product identifiers and attributes to maintain a reliable master catalog. It prevents missing or duplicate product IDs and SKUs, validates SKU formatting, blocks negative list prices, and restricts product status to approved lifecycle values. Together, these checks prevent broken joins with sales and inventory data, reduce pricing inconsistencies, and ensure merchandising, reporting, and downstream operational processes rely on clean, consistent product master data.

product_catalog_data_contract.yaml

dataset
checks:
  - schema:
      allow_extra_columns: false
      allow_other_column_order: false
  - row_count:
      threshold:
        must_be_greater_than: 0

  # Dataset integrity rules
  - failed_rows:
      name: "List price must not be negative"
      qualifier: list_price_non_negative
      expression: list_price < 0
  - failed_rows:
      name: "SKU should be unique"
      qualifier: sku_unique
      query: |
        SELECT sku
        FROM datasource.database.schema.products
        WHERE sku IS NOT NULL
        GROUP BY sku
        HAVING COUNT(*) > 1
      threshold:
        must_be: 0
      description: "Prevents duplicate SKUs that break catalog joins and reporting."
columns:
  - name: product_id
    data_type: string
    checks:
      - missing:
      - duplicate:
      - invalid:
          name: "product_id length guardrail"
          valid_min_length: 1
          valid_max_length: 64
  - name: sku
    data_type: string
    checks:
      - missing:
      - invalid:
          name: "sku must be uppercase letters/numbers with separators"
          valid_format:
            name: SKU format
            regex: "^[A-Z0-9][A-Z0-9_-]{0,63}$"
  - name: product_name
    data_type: string
    checks:
      - missing:
      - invalid:
          name: "product_name length guardrail"
          valid_min_length: 2
          valid_max_length: 255
  - name: category
    data_type: string
    checks:
      - missing:
      - invalid:
          name: "category length guardrail"
          valid_min_length: 2
          valid_max_length: 100
  - name: list_price
    data_type: float
    checks:
      - missing:
      - invalid:
          name: "list_price must be zero or positive"
          valid_min: 0
  - name: status
    data_type: string
    checks:
      - missing:
      - invalid:
          name: "Allowed product statuses"
          valid_values

How to Enforce Data Contracts with Soda

Embed data quality through data contracts at any point in your pipeline.

Embed data quality through data contracts at any point in your pipeline.

# pip install soda-{data source} for other data sources

# pip install soda-{data source} for other data sources

pip install soda-postgres

pip install soda-postgres

# verify the contract locally against a data source

# verify the contract locally against a data source

soda contract verify -c contract.yml -ds ds_config.yml

soda contract verify -c contract.yml -ds ds_config.yml

# publish and schedule the contract with Soda Cloud

# publish and schedule the contract with Soda Cloud

soda contract publish -c contract.yml -sc sc_config.yml

soda contract publish -c contract.yml -sc sc_config.yml

Check out the CLI documentation to learn more.

Check out the CLI documentation to learn more.

How to Automatically Create Data Contracts.
In one Click.

Automatically write and publish data contracts using Soda's AI-powered data contract copilot.

Qualité des données IA basée sur la recherche

Nos recherches ont été publiées dans des revues et conférences de renom, telles que NeurIPs, JAIR et ACML. Les mêmes lieux qui ont fait progresser les fondations de GPT et de l'IA moderne.

Qualité des données IA basée sur la recherche

Nos recherches ont été publiées dans des revues et conférences de renom, telles que NeurIPs, JAIR et ACML. Les mêmes lieux qui ont fait progresser les fondations de GPT et de l'IA moderne.

Qualité des données IA basée sur la recherche

Nos recherches ont été publiées dans des revues et conférences de renom, telles que NeurIPs, JAIR et ACML. Les mêmes lieux qui ont fait progresser les fondations de GPT et de l'IA moderne.

4,4 sur 5

Commencez à faire confiance à vos données. Aujourd'hui.

Trouvez, comprenez et corrigez tout problème de qualité des données en quelques secondes.
Du niveau de la table au niveau des enregistrements.

Adopté par

4,4 sur 5

Commencez à faire confiance à vos données. Aujourd'hui.

Trouvez, comprenez et corrigez tout problème de qualité des données en quelques secondes.
Du niveau de la table au niveau des enregistrements.

Adopté par

4,4 sur 5

Commencez à faire confiance à vos données. Aujourd'hui.

Trouvez, comprenez et corrigez tout problème de qualité des données en quelques secondes.
Du niveau de la table au niveau des enregistrements.

Adopté par