Soda, the provider of data reliability tools and cloud observability platform, has today announced a partnership with Xebia's data & AI consultancy, GoDataDriven, to solve critical data reliability and quality issues faced by organizations. The partnership includes the co-development of a new Open Source Software (OSS) Spark library that supports organizations with a big data friendly solution for maintaining a high level of trust in data within the Spark ecosystem, as well as an agreement to build optimal data quality workflows for joint customers.
Apache Spark is one of the most popular open source projects on the planet, with more than 1,000 contributors from over 250 organizations, with over 35,000 stars on Github. Spark’s popularity is driven by its ability to process large datasets at speed, APIs that enable flexibility and simple migration to distributed frameworks, and versatility in connecting to virtually any data source. But with so many organizations relying on this data to build and maintain data products, data teams have so far lacked the transparency needed to ensure data quality. Working alongside GoDataDriven, whose clients include Ahold Delhaize, Bol.com, Unilever, Mollie, and ING, Soda has released Soda Spark, the latest OSS release to provide a common solution to a common problem for data engineers.
Soda Spark is part of a growing suite of OSS data reliability tools for engineers working in data-intensive environments where reliable, high-quality data is of paramount importance. Using the Soda Spark library, data engineers can log errors from failed tests using their preferred logging system, or through Soda Cloud, and avoid writing corrupt data to their data lakes. And because Spark DataFrames are data source agnostic, Soda Spark scans can run against a variety of data sources, including Amazon Athena, Amazon Redshift, Google Big Query, and Snowflake.
“Transparency creates a trust in data that becomes the catalyst for an organization to be truly data enabled and make confident, data-driven decisions,” explains Niels Zeilemaker, CTO, GoDataDriven. “In Soda, we recognised a shared belief in improving end-to-end data issue workflows so that data teams have the power to prioritize and resolve issues based on a holistic view of what is happening across an entire organization. Soda’s low technical barriers and vibrant developer community provide the ideal platform to meet the needs of our customers, starting with the co-development of a Spark integration which further extends the data quality workflow.”
With so much data being produced on a daily basis, Spark’s ability to unify disparate data processing capabilities, allowing developers to use a single framework to accommodate all their processing needs, has seen it become a critical part of the modern data stack. From the creation of on-demand video streaming tailored precisely to viewers preferences, to banks crunching huge volumes of machine learning data to support fraud prevention, Spark is fundamentally changing the business world.
“Achieving reliable, high-quality data across the data product lifecycle requires everyone in a data team - from data and analytics engineers through to analysts - to be aware of any data issues before they have a downstream impact on their data products” said Maarten Masschelein, CEO & Co-Founder, Soda. “Through our partnership with GoDataDriven, a consultancy that has built a stellar reputation working in lock-step with some of the most high profile brands in Continental Europe, Soda is helping data teams to improve their operations through workflows and best data engineering practices to enable the trust and confidence in data that organizations need to become truly data enabled.”
Soda Spark is provided as open source software under the Apache License and offered for free on Github. The product executes either on the cloud or on the local systems of data engineers. For more information and to learn how to get started, please visit the Soda Spark docs or read the blog.
About Soda
Soda is the data reliability company that provides Open Source (OSS) and SaaS tools that enable data teams to discover, prioritize, and resolve data issues. Soda’s mission is to bring everyone closer to the data, resulting in data products and analytics that everyone can trust. The Soda global data community already counts Disney, HelloFresh, and Udemy as major contributors to have deployed Soda’s data reliability tools. Soda is one of the 2021 Gartner® Cool Vendors™ in Data Management [1], recognition and validation for our approach to solving the number one data management challenge faced by modern organizations: ensuring high quality, trusted data is available to enable confident decision making, serve and delight customers, and improve processes. For more information, visit soda.io.
About GoDataDriven
GoDataDriven helps the world's Top 250 companies and category leaders embrace data innovation, adopt the latest data and AI technologies, and establish enterprise-wide data & aI training programs. GoDataDriven is part of Xebia (xebia.com), a global IT pioneer providing high-quality consulting services that cover all aspects of digital transformation. From software development to cloud, data, AI, software consulting, DevOps, and Agile. Its clients include, among others, Disney, Ahold Delhaize, Tesco, Philips, and ING bank. Xebia employs 3,100 people who work from strategically located offices in Europe, APAC, UAE, the UK, and the US and has a revenue of 200+ million Euros in the last four quarters. Please visit www.godatadriven.com for more information.