Soda, the provider of data reliability tools and cloud observability platform, has announced the general availability of Soda Core, the open source framework for Data Engineers to embed data reliability checks and quality management into data pipelines. Powered by SodaCL (Soda Checks Language), also released as the first Domain-Specific Language (DSL) for data reliability, Soda Core introduces data engineering as-code practices to create broad coverage, eliminate data downtime, and unlock the cumbersome tasks of detecting and resolving issues across the entire data product lifecycle.
Almost every company is building innovative new products using data, which means that data has to be reliable enough to meet a wide range of evolving needs. In most data teams, Data Engineers are responsible for building systems and pipelines to ingest, model, and deliver reliable data products to the business. Once in production these products need constant attention to address changes to data schemas and structures, broken transformation logic, and concept drift, all of which impacts reliability, quality, and trust in the data. The challenge for Data Engineers is manually fixing these data issues at scale with a lack of tools, processes, and expertise that would enable them to create more reliable and high-quality datasets.
Available to download from today, Soda Core introduces a free, open-source framework that empowers Data Engineers to build and maintain data checks as-code at scale, across every data workload, from ingestion to transformation to consumption. Soda Core offers Data Engineers a library of tools for data reliability, with core components including the use of dataset metadata to understand the shape and health of the data, and built-in metrics and broad check coverage that can be used to validate a huge number of data quality parameters.
With Soda Core, fixed and dynamic thresholds ensure that data can be tested and validated with dynamic threshold systems like change-over-time and anomaly detection, as part of a comprehensive end-to-end workflow that helps detect and resolve issues, and automatically alert the right people at the right time. Alerts and notifications can be created using a preferred ticketing or on-call system which means that, by extending Soda Core with a Soda Cloud account, notifications can be routed through to the right people, enabling less technical users to get involved by adjusting thresholds or adding new checks altogether.
Also released today, SodaCL replaces the time-consuming, resource intensive need to code in SQL with one language that is writable and readable by almost anyone, meaning that everyone on a data team can define the thresholds of what good data needs to look like. SodaCL provides a language foundation that will evolve over time to address business specific issues across multiple business domains including areas such as Asset Management, Supply Chain, and Customer Data. The first iteration of SodaCL delivers test and monitor checks-as-code from ingestion through to transformation, with over 30 built-in metrics and check types available to validate a great number of data quality parameters and generate value immediately.
“This first public release of Soda Core and SodaCL is one of the most important milestones in our journey so far, giving Data Engineers the framework and language to get started and scale with reliability engineering and data quality management,” explains Tom Baeyens, CTO, and Co-Founder, Soda. “We realized early on that when it comes to data quality, the needs of engineers are quite different when compared with the needs of the data team as a whole. A lot of people in a data team know what good data looks like but only a few can code the checks. With our releases today, we are providing the tools to remove the bottlenecks that exist around coding data reliability, enabling Data Engineers to build data quality checks-as-code directly into their pipelines and fundamentally change how teams set up and maintain reliable, high-quality data products.”
Starting with the release of Soda SQL in early 2021, Soda’s open-source library of tools have been built by data engineers and product owners, and embraced by a rapidly growing community which includes Disney, HelloFresh, Servier, and Udemy as major contributors. Soda has been gathering feedback into an early working version of SodaCL, extensively testing the new DSL as part of a preview program with over 40 data engineers.
Soda is the data reliability and quality platform that creates the observability data teams need to find, analyze, and resolve data issues. Our open-source tools and cloud platform bring everyone closer to the data to confidently make data-informed decisions. Soda is one of the 2021 Gartner® Cool Vendors™ in Data Management, recognition and validation for our approach to solving the number one data management challenge faced by modern organizations: ensuring high quality, trusted data is available. For more information, visit soda.io.