{"id":57974,"date":"2024-09-23T09:00:26","date_gmt":"2024-09-23T07:00:26","guid":{"rendered":"https:\/\/www.inovex.de\/?p=57974"},"modified":"2026-02-18T10:25:08","modified_gmt":"2026-02-18T09:25:08","slug":"data-quality-made-easy-with-soda","status":"publish","type":"post","link":"https:\/\/www.inovex.de\/de\/blog\/data-quality-made-easy-with-soda\/","title":{"rendered":"Data Quality Made Easy with Soda"},"content":{"rendered":"<p>The term data quality is generally used to describe the degree to which data corresponds to the real things or facts it represents. As it is often difficult or impossible in practice to assess the quality of data based on this definition, it is usually estimated by evaluating the deviation from predefined assumptions. The assumptions originate from the specific domain and are to be identified and recorded first, e.g. \u201cThe measured temperature is always between -10\u00b0C and +50\u00b0C due to the technical limitations of the sensor\u201c. Assumptions can refer to the semantic or syntactic correctness, as well as the up-to-dateness or completeness of a data set. In addition to the specification of assumptions, ensuring data quality also involves continuous updating and regular validation, as well as the implementation of a process for handling anomalies. A detailed introduction to data quality can be found <a href=\"https:\/\/www.inovex.de\/de\/blog\/ensuring-data-quality-a-data-engineers-perspective\/\">here<\/a>. <!--more--><\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_83 counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\"><p class=\"ez-toc-title\" style=\"cursor:inherit\"><\/p>\n<\/div><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-quality-made-easy-with-soda\/#Soda-as-a-Tool-for-Ensuring-Data-Quality\" >Soda as a Tool for Ensuring Data Quality<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-quality-made-easy-with-soda\/#Soda-Core-Soda-Cloud-and-Soda-Library\" >Soda Core, Soda Cloud, and Soda Library<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-quality-made-easy-with-soda\/#Soda-hosted-and-self-hosted-Agents\" >Soda-hosted and self-hosted Agents<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-quality-made-easy-with-soda\/#Demo-Ensuring-Data-Quality-with-Soda\" >Demo: Ensuring Data Quality with Soda<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-quality-made-easy-with-soda\/#Installation-and-Configuration-of-Soda-Core\" >Installation and Configuration of Soda Core<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-quality-made-easy-with-soda\/#Creation-and-Execution-of-Soda-Checks\" >Creation and Execution of Soda Checks<\/a><ul class='ez-toc-list-level-4' ><li class='ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-quality-made-easy-with-soda\/#Value-Checks\" >Value Checks<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-quality-made-easy-with-soda\/#Custom-Checks\" >Custom Checks<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-quality-made-easy-with-soda\/#Schema-Checks\" >Schema Checks<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-quality-made-easy-with-soda\/#Programmatic-Scans-with-Python\" >Programmatic Scans with Python<\/a><ul class='ez-toc-list-level-4' ><li class='ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-11\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-quality-made-easy-with-soda\/#Configure-and-Execute-Scan-Object\" >Configure and Execute Scan Object<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-12\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-quality-made-easy-with-soda\/#Save-and-Process-Scan-Results\" >Save and Process Scan Results<\/a><\/li><\/ul><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-13\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-quality-made-easy-with-soda\/#Conclusion\" >Conclusion<\/a><\/li><\/ul><\/nav><\/div>\n<h2><span class=\"ez-toc-section\" id=\"Soda-as-a-Tool-for-Ensuring-Data-Quality\"><\/span>Soda as a Tool for Ensuring Data Quality<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>To use Soda to ensure data quality, you can choose between <a href=\"https:\/\/github.com\/sodadata\/soda-core\">Soda Core<\/a> and <a href=\"https:\/\/www.soda.io\/platform\">Soda Cloud<\/a>. While Soda Core is free to use, Soda Cloud is a subscription-based SaaS offering that can be interacted with via the <a href=\"https:\/\/docs.soda.io\/soda-library\/install.html\">Soda Library<\/a>. The range of functions and support for external systems differ significantly between Soda Core and Soda Cloud. With the<a href=\"https:\/\/docs.soda.io\/soda-cl\/metrics-and-checks.html\"> Soda Checks Language (SodaCL)<\/a>, Soda offers a domain-specific language based on YAML that can be used to define assumptions about data, among other things. More than 25 metrics* are already provided for this purpose, which can be used to define assumptions about data types, missing values, data set size, and much more. This low-code approach allows Soda to be used without extensive programming knowledge and decouples the assumptions of a specific data set from the generic validation process.<\/p>\n<p>In cases where these metrics are not sufficient, user-defined checks can be implemented based on SQL. Soda Checks are defined using the YAML format and provide the basis for validations (Soda Scans). With alert levels and fail conditions* SodaCL offers additional options for implementing more complex strategies to handle anomalies. Last but not least, Soda offers a wide range of<a href=\"https:\/\/www.soda.io\/integrations\"> integration options*<\/a>, e.g. to common databases and communication systems. In addition to common cloud data warehouses such as Snowflake, BigQuery, and Redshift, relational database systems such as PostgreSQL, MySQL, and Microsoft SQL Server are also supported. In addition, Soda offers integration options* for dbt and Spark as well as for Slack, Jira, and GitHub.<\/p>\n<p>* Limited functionality in the Soda Core OS version compared to Soda Cloud<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Soda-Core-Soda-Cloud-and-Soda-Library\"><\/span>Soda Core, Soda Cloud, and Soda Library<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>While the open-source version of Soda Core is limited to ad-hoc analyses, Soda Cloud and the associated Soda Library allow analysis results to be stored in the cloud and made permanently available for retrieval. To comply with data protection regulations, you can choose between the two regions of the EU and the USA for storage.<\/p>\n<div>\n<dl id=\"attachment_57985\">\n<dt><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/Soda_1_Dashboard.png\" alt=\"Startseite des Soda Cloud Dashboards\" width=\"1410\" height=\"912\" \/><\/dt>\n<dd>Soda Cloud Dashboard<\/dd>\n<\/dl>\n<\/div>\n<p>With Soda Cloud comes the Soda Cloud Dashboard, which allows for easy tracking of your historic data quality checks. In addition to monitoring in the Soda Cloud Dashboard, Soda Cloud also enables the implementation of more complex data quality checks based on a history of previous runs. Some examples of complex checks that are only available in Soda Cloud include:<\/p>\n<ul>\n<li><a href=\"https:\/\/docs.soda.io\/soda-cl\/recon.html\">Reconciliation checks<\/a>: comparison of multiple data sets for equality or similarity, e.g. after a data migration.<\/li>\n<li><a href=\"https:\/\/docs.soda.io\/soda-cl\/distribution.html\">Evolution of data distributions<\/a>: comparison with historical data distributions using hypothesis tests<\/li>\n<li><a href=\"https:\/\/docs.soda.io\/soda-cl\/anomaly-detection.html\">Anomaly detection<\/a>: detection of unusual data using historical metrics<\/li>\n<\/ul>\n<div>\n<dl id=\"attachment_57987\">\n<dt><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/Soda_2_Checks.png\" alt=\"\u00dcbersicht von Checks im Soda Cloud Dashboard\" width=\"1411\" height=\"844\" \/><\/dt>\n<dd>Soda Cloud Checks<\/dd>\n<\/dl>\n<\/div>\n<p>To identify the cause of failed checks, Soda Library can send a sample of the affected data records to Soda Cloud for viewing and analysis. However, it should be noted that potentially sensitive data may be transferred to Soda Cloud.<\/p>\n<div>\n<dl id=\"attachment_57989\">\n<dt><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/Soda_3_Datasets.png\" alt=\"\u00dcbersicht von Datasets im Soda Cloud Dashboard\" width=\"1413\" height=\"911\" \/><\/dt>\n<dd>Soda Cloud Datasets<\/dd>\n<\/dl>\n<\/div>\n<p>Soda Cloud offers insights into your data quality check on different levels (e.g. overall data quality state, datasets, and individual checks) which allows for a good overview.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Soda-hosted-and-self-hosted-Agents\"><\/span>Soda-hosted and self-hosted Agents<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>With Soda Agents, it is possible to run scheduled and regular data quality checks. A Soda Agent includes a complete installation of the Soda Library, is ready to run, and can be managed in Soda Cloud. The user can choose between two variants:<\/p>\n<p>The Soda-hosted agent that can be provided with just a few clicks is operated in the Soda Cloud and can only be used on publicly accessible databases (e.g. MySQL, PostgreSQL, Snowflake, and BigQuery are supported). Credentials for the data sources must be stored in the Soda Cloud. Using the Soda-hosted agent variant is very easy, as Soda takes care of the infrastructure and operations, i.e. setup, maintenance, and scaling of the agent. The costs for this variant are calculated by usage (i.e. pay-per-use).<\/p>\n<p>On the other hand, self-hosted agents offer the option of operating them within your own data infrastructure, with all data sources being supported. In this case, credentials and data sources can also remain in your infrastructure and do not need to be stored in the Soda Cloud. The advantage of the self-hosted agent variant is that you have more control over the operations and database credentials and that you can adapt it flexibly to your own needs. The disadvantage is the effort and costs involved in operating it yourself.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Demo-Ensuring-Data-Quality-with-Soda\"><\/span>Demo: Ensuring Data Quality with Soda<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>In the following, the Soda Checks Language (SodaCL) is used to define assumptions in the form of soda checks, which are then validated with data scans. As a preparation for this demo, a local instance of PostgreSQL was used, which was populated with data from the<a href=\"https:\/\/github.com\/imkumaraju\/dvdrenat-sample-databse\"> DVD Rental PostgreSQL database<\/a>.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Installation-and-Configuration-of-Soda-Core\"><\/span>Installation and Configuration of Soda Core<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>First, the necessary Python packages for the respective data source need to be installed. To integrate PostgreSQL, it is therefore required to install the Python package <a href=\"https:\/\/pypi.org\/project\/soda-core-postgres\/\">soda-core-postgres<\/a> as follows:<\/p>\n<pre>pip install soda-core-postgres<\/pre>\n<p>Furthermore, Soda requires a <a href=\"https:\/\/docs.soda.io\/soda\/connect-postgres.html\">configuration<\/a> to ensure database connectivity, depending on the data source. This is created in YAML format and looks as follows in our case:<\/p>\n<pre title=\"config.yml\">data_source my_local_db:\r\n  type: postgres\r\n  connection:\r\n    host: localhost\r\n    port: '5432'\r\n    username: user\r\n    password: myPassword\r\n  database: dvdrental\r\n  schema: public\r\n<\/pre>\n<p>Optionally, to establish a connection with Soda Cloud, an API key and a secret must be provided in the configuration, which can be generated in the Soda Cloud account under Your Avatar &gt; Profile &gt; API Keys.<\/p>\n<pre title=\"config.yml\">soda_cloud:\r\n  host: cloud.soda.io\r\n  api_key_id: &lt;your_api_key&gt;\r\n  api_key_secret: &lt;your_api_key_secret&gt;\r\n<\/pre>\n<p>To check the configuration and the connection to the data source, Soda CLI offers the following command:<\/p>\n<pre>soda test-connection -d my_local_db -c config.yml<\/pre>\n<h3><span class=\"ez-toc-section\" id=\"Creation-and-Execution-of-Soda-Checks\"><\/span>Creation and Execution of Soda Checks<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Similar to the configuration files defined above for the database connection, YAML configuration files are also used for the soda checks.<\/p>\n<h4><span class=\"ez-toc-section\" id=\"Value-Checks\"><\/span>Value Checks<span class=\"ez-toc-section-end\"><\/span><\/h4>\n<p>In the data model of the<a href=\"https:\/\/www.postgresqltutorial.com\/postgresql-getting-started\/postgresql-sample-database\/\"> DVD Rental data set<\/a>, the actor table contains information about movie actors. In addition to the ID of each actor, the table also includes the first and last names of the actors, as well as a metadata column last_update with the last update of the corresponding rows.<\/p>\n<p>We start by defining checks to verify the presence of the first name and surname in each record:<\/p>\n<pre title=\"checks.yml\">checks for actor:\r\n  - missing_count(first_name) = 0\r\n  - missing_count(last_name) = 0\r\n<\/pre>\n<p>We can soften the requirement that not a single first name may be missing by making use of the <a href=\"https:\/\/docs.soda.io\/soda-cl\/optional-config.html#add-alert-configurations\">alert levels<\/a> warn and fail:<\/p>\n<pre title=\"checks.yml\">checks for actor:\r\n# check for missing columns  \r\n  - missing_count(first_name):\r\n      warn: when between 1 and 10\r\n      fail: when &gt; 10\r\n  - missing_count(last_name) = 0\r\n<\/pre>\n<p>If the number of missing first names in the actor table is greater than 0, but less than or equal to 10, this now only appears as a warning in the validation result. The validation itself returns a positive result in such a case. Only if there are more than 10 first names missing in the actor table, the validation would fail.<\/p>\n<p>In the table customer, we can find the column email, which contains the email addresses of the customers. In the following, we define a check to examine whether a unique email address in a valid format has been provided for each customer:<\/p>\n<pre title=\"checks.yml\">checks for customer:\r\n - duplicate_count(email) = 0\r\n - invalid_count(email) = 0:\r\n     valid format:email\r\n<\/pre>\n<p>By default, the check is carried out for all records in the customer table. If only a subset of the data is to be checked, this can be implemented by using a filter configuration either for a single check (in-check filter) or an entire table (dataset-filter). In the following, a check is defined analogously to the example above to check the email column. However, only active customers are taken into account here, which are identified by the value 1 in the active column:<\/p>\n<pre title=\"checks.yml\">checks for customer:\r\n - duplicate_count(email) = 0\r\n     filter: active = 1\r\n  - invalid_count(email) = 0:\r\n     valid format:email\r\n     filter: active = 1<\/pre>\n<p>Here you can see that in-check filters are defined for each check and cannot be reused. Dataset filters, on the other hand, are defined per table and can be reused in several checks. The same check can be implemented as a dataset filter as follows:<\/p>\n<pre title=\"checks.yml\">filter customer [active_only]:\r\n    where: active = 1\r\n\r\nchecks for customer [active_only]:\r\n  - invalid_count(email) = 0:\r\n      valid format: email\r\n  - duplicate_count(email) = 0<\/pre>\n<p>An advantage of this is that you only have to make changes to the filter condition in one central location and these changes will affect all associated checks.<\/p>\n<h4><span class=\"ez-toc-section\" id=\"Custom-Checks\"><\/span>Custom Checks<span class=\"ez-toc-section-end\"><\/span><\/h4>\n<p>You can also define your custom checks in addition to the existing soda checks. Custom checks use SQL queries to evaluate a user-defined metric. In the following, a custom check is created for the rental table to check whether the rental date is before the return date:<\/p>\n<pre title=\"checks.yml\">checks for rental:\r\n  - invalid_dates = 0:\r\n    invalid_dates query: |\r\n      SELECT COUNT(*) AS invalid_dates\r\n      FROM rental\r\n      WHERE rental_date &gt; return_date;\r\n    # Optional: if connection to Soda Cloud defined\r\n    failed rows query: |\r\n      SELECT * \r\n      FROM rental\r\n      WHERE rental_date &gt; return_date;\r\n<\/pre>\n<p>One disadvantage of custom checks is that they do not support dataset filters at the moment. In the example above, a failed rows query was defined in addition to the custom check, with which invalid data records can be sent to Soda Cloud to view them there.<\/p>\n<h4><span class=\"ez-toc-section\" id=\"Schema-Checks\"><\/span>Schema Checks<span class=\"ez-toc-section-end\"><\/span><\/h4>\n<p>In addition to checking table values, table schemas can also be checked in Soda. <a href=\"https:\/\/docs.soda.io\/soda-cl\/schema.html\">Schema checks<\/a> are particularly useful at the beginning of ETL pipelines to check the presence and data types of critical columns. A schema check for the actor table could look like this:<\/p>\n<pre title=\"checks.yml\">checks for actor:\r\n  - schema:\r\n    name: Confirm that required columns are present and of the correct type\r\n    fail:\r\n      when required column missing:\r\n        - actor_id\r\n        - first_name\r\n        - last_name\r\n      when wrong column type:\r\n        actor_id: integer\r\n        first_name: character varying\r\n        last_name: character varying\r\n    warn:\r\n      when required column missing:\r\n        - last_update\r\n      when wrong column type:\r\n        last_update: timestamp without time zone\r\n<\/pre>\n<p>Similar to the previous example, the two alert levels fail and warn are used to differentiate between the severity of the different violations: If actor_id, first_name or last_name are missing or the data type of these columns is incorrect, Soda recognizes this as an error and returns a negative result. If the last_update column does not exist or the format is incorrect, Soda only issues a warning.<\/p>\n<p>If you want to execute the previously defined checks, the following Soda CLI command can be used:<\/p>\n<pre>soda scan checks.yml -d my_database -c config.yml<\/pre>\n<p>A successful scan returns the following result:<\/p>\n<pre>Scan summary:\r\n5\/5 checks PASSED: \r\n    actor in my_database\r\n      Confirm that required columns are present and with correct type [PASSED]\r\n      missing_count(last_name) = 0 [PASSED]\r\n      missing_count(first_name) = 0 [PASSED]\r\n    customer [active_only] in my_database\r\n      invalid_count(email) = 0 [PASSED]\r\n      duplicate_count(email) = 0 [PASSED]\r\nAll is good. No failures. No warnings. No errors.<\/pre>\n<p>In case there is a duplicate in the email addresses of the customer table, the check for duplicates would return a negative result. The scan result in the CLI would be as follows:<\/p>\n<pre>Scan summary:\r\n4\/5 checks PASSED: \r\n    actor in my_database\r\n      Confirm that required columns are present and with correct type [PASSED]\r\n      missing_count(last_name) = 0 [PASSED]\r\n      missing_count(first_name) = 0 [PASSED]\r\n    customer [active_only] in my_database\r\n      invalid_count(email) = 0 [PASSED]\r\n1\/5 checks FAILED: \r\n    customer [active_only] in my_database\r\n      duplicate_count(email) = 0 [FAILED]\r\n        check_value: 1\r\nOops! 1 failures. 0 warnings. 0 errors. 4 pass.\r\n<\/pre>\n<h3><span class=\"ez-toc-section\" id=\"Programmatic-Scans-with-Python\"><\/span>Programmatic Scans with Python<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Data validation can also be done programmatically using soda scans with Python. The functionality of the Python library is similar to that of the CLI.<\/p>\n<h4><span class=\"ez-toc-section\" id=\"Configure-and-Execute-Scan-Object\"><\/span>Configure and Execute Scan Object<span class=\"ez-toc-section-end\"><\/span><\/h4>\n<p>In the following example, a soda scan is created and then executed. We configure the scan using existing YAML files in this case. It would also be possible to configure it directly in the code using YAML strings.<\/p>\n<pre>from soda.scan import Scan\r\n\r\nscan = Scan()\r\n\r\n# Set scan definition name, equivalent to CLI -s option\r\nscan.set_scan_definition_name(\"soda_demo\")\r\n\r\n# Add config files\r\nscan.add_configuration_yaml_file(file_path=\".\/config.yml\")\r\nscan.add_sodacl_yaml_file(\".\/checks.yml\")\r\n\r\n# select data source from config  \r\nscan.set_data_source_name(\"my_local_db\")\r\n\r\n# Execute the scan\r\nexit_code = scan.execute()\r\n\r\nprint(\"Exit code:\", exit_code)\r\n\r\n# Set logs to verbose mode, equivalent to CLI -V option\r\nscan.set_verbose(True)\r\n\r\n# Print results of scan\r\nprint(scan.get_logs_text())<\/pre>\n<h4><span class=\"ez-toc-section\" id=\"Save-and-Process-Scan-Results\"><\/span>Save and Process Scan Results<span class=\"ez-toc-section-end\"><\/span><\/h4>\n<p>We now use the scan object to analyze the result of the validation. Using the functions of the scan object, we can react to the presence of errors and warnings:<\/p>\n<pre># inspect the scan results\r\nprint(scan.get_scan_results())\r\n\r\n# react in case of failures\r\nif scan.has_check_fails():\r\n    print(\"some checks failed!\")\r\n\r\n# react in case of failures or warnings\r\nif scan.has_checks_warn_or_fail():\r\n    for check in scan.get_checks_warn_or_fail():\r\n        print(check.get_dict())<\/pre>\n<p>If you access the individual results of the checks via get_checks_warn_or_fail, specific information such as totalRowCount (i.e. the number of invalid records) can be viewed for custom checks, which cannot be viewed via the log. By default, individual records that have not passed the check can only be viewed in the Soda Cloud. However, in order to be able to view these records without Soda Cloud, it is possible to implement a <a href=\"https:\/\/docs.soda.io\/soda-v3\/run-a-scan\/failed-row-samples#configure-a-python-custom-sampler\">CustomSampler<\/a> class that adapts the standard sampler.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Conclusion\"><\/span>Conclusion<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Soda offers extensive options for ensuring data quality, both in terms of the range of checks and the supported backends. Particularly worth highlighting is the intuitive usage concept, which stands out positively in comparison to its competitor, <a href=\"https:\/\/greatexpectations.io\/\">Great Expectations<\/a>. The framework already appears to be quite mature, although the open-source version Soda Core is limited to the core functionality, as the name suggests. To implement more complex mechanisms for ensuring data quality without major implementation effort, it is a good idea to use the SaaS version Soda Cloud. This also offers built-in dashboards to monitor the results of validations, including their history.<\/p>\n<p>At the moment, the pricing for using Soda Cloud is not yet publicly available but is subject to individual offers. When using SaaS offerings, special caution is required when regarding data protection, especially when processing personal data. For example, the transfer of personal data to third-party providers must be contractually regulated and requires the consent of the data subjects.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The term data quality is generally used to describe the degree to which data corresponds to the real things or facts it represents. As it is often difficult or impossible in practice to assess the quality of data based on this definition, it is usually estimated by evaluating the deviation from predefined assumptions. The assumptions [&hellip;]<\/p>\n","protected":false},"author":354,"featured_media":58250,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"inline_featured_image":false,"ep_exclude_from_search":false,"footnotes":""},"tags":[385,1127,179,206],"service":[411,431],"coauthors":[{"id":354,"display_name":"Haydar Aky\u00fcrek","user_nicename":"hakyuerek"},{"id":100,"display_name":"Simon Bachstein","user_nicename":"sbachstein"},{"id":422,"display_name":"Hiroshi Hamano","user_nicename":"hhamano"},{"id":103,"display_name":"Marcel Spitzer","user_nicename":"mspitzer"}],"class_list":["post-57974","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","tag-data-engineering","tag-data-mesh","tag-data-products","tag-data-science","service-data-engineering","service-data-science"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.6 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Data Quality Made Easy with Soda - inovex GmbH<\/title>\n<meta name=\"description\" content=\"Soda offers extensive options for ensuring data quality, both in terms of the range of checks and the supported backends. We talk about the details.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-quality-made-easy-with-soda\/\" \/>\n<meta property=\"og:locale\" content=\"de_DE\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Data Quality Made Easy with Soda - inovex GmbH\" \/>\n<meta property=\"og:description\" content=\"Soda offers extensive options for ensuring data quality, both in terms of the range of checks and the supported backends. We talk about the details.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.inovex.de\/de\/blog\/data-quality-made-easy-with-soda\/\" \/>\n<meta property=\"og:site_name\" content=\"inovex GmbH\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/inovexde\" \/>\n<meta property=\"article:published_time\" content=\"2024-09-23T07:00:26+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-02-18T09:25:08+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.inovex.de\/wp-content\/uploads\/Data-Quality-made-easy-with-Soda.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1500\" \/>\n\t<meta property=\"og:image:height\" content=\"880\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Haydar Aky\u00fcrek, Simon Bachstein, Hiroshi Hamano, Marcel Spitzer\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/www.inovex.de\/wp-content\/uploads\/Data-Quality-made-easy-with-Soda-1024x601.png\" \/>\n<meta name=\"twitter:creator\" content=\"@inovexgmbh\" \/>\n<meta name=\"twitter:site\" content=\"@inovexgmbh\" \/>\n<meta name=\"twitter:label1\" content=\"Verfasst von\" \/>\n\t<meta name=\"twitter:data1\" content=\"Haydar Aky\u00fcrek\" \/>\n\t<meta name=\"twitter:label2\" content=\"Gesch\u00e4tzte Lesezeit\" \/>\n\t<meta name=\"twitter:data2\" content=\"13\u00a0Minuten\" \/>\n\t<meta name=\"twitter:label3\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data3\" content=\"Haydar Aky\u00fcrek, Simon Bachstein, Hiroshi Hamano, Marcel Spitzer\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-quality-made-easy-with-soda\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-quality-made-easy-with-soda\\\/\"},\"author\":{\"name\":\"Haydar Aky\u00fcrek\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/person\\\/cbd8ef525753381fe614c04b9fac5071\"},\"headline\":\"Data Quality Made Easy with Soda\",\"datePublished\":\"2024-09-23T07:00:26+00:00\",\"dateModified\":\"2026-02-18T09:25:08+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-quality-made-easy-with-soda\\\/\"},\"wordCount\":2059,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-quality-made-easy-with-soda\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/Data-Quality-made-easy-with-Soda.png\",\"keywords\":[\"Data Engineering\",\"Data Mesh\",\"Data Products\",\"Data Science\"],\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-quality-made-easy-with-soda\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-quality-made-easy-with-soda\\\/\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-quality-made-easy-with-soda\\\/\",\"name\":\"Data Quality Made Easy with Soda - inovex GmbH\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-quality-made-easy-with-soda\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-quality-made-easy-with-soda\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/Data-Quality-made-easy-with-Soda.png\",\"datePublished\":\"2024-09-23T07:00:26+00:00\",\"dateModified\":\"2026-02-18T09:25:08+00:00\",\"description\":\"Soda offers extensive options for ensuring data quality, both in terms of the range of checks and the supported backends. We talk about the details.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-quality-made-easy-with-soda\\\/#breadcrumb\"},\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-quality-made-easy-with-soda\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-quality-made-easy-with-soda\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/Data-Quality-made-easy-with-Soda.png\",\"contentUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/Data-Quality-made-easy-with-Soda.png\",\"width\":1500,\"height\":880,\"caption\":\"Grafik: Mann sitzt auf gro\u00dfem Monitor mit Code. Ein weiterer Mann mit Lupe scannt den Code.\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-quality-made-easy-with-soda\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Data Quality Made Easy with Soda\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#website\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\",\"name\":\"inovex GmbH\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"de\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\",\"name\":\"inovex GmbH\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/inovex-logo-16-9-1.png\",\"contentUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/inovex-logo-16-9-1.png\",\"width\":1921,\"height\":1081,\"caption\":\"inovex GmbH\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/inovexde\",\"https:\\\/\\\/x.com\\\/inovexgmbh\",\"https:\\\/\\\/www.instagram.com\\\/inovexlife\\\/\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/inovex\",\"https:\\\/\\\/www.youtube.com\\\/channel\\\/UC7r66GT14hROB_RQsQBAQUQ\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/person\\\/cbd8ef525753381fe614c04b9fac5071\",\"name\":\"Haydar Aky\u00fcrek\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/986cdab16b190513ff04cbc00254ee52242f52ba52e39f8bfb785ea4302e4715?s=96&d=retro&r=g7bcab65279fbe837c8efb7679b758a08\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/986cdab16b190513ff04cbc00254ee52242f52ba52e39f8bfb785ea4302e4715?s=96&d=retro&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/986cdab16b190513ff04cbc00254ee52242f52ba52e39f8bfb785ea4302e4715?s=96&d=retro&r=g\",\"caption\":\"Haydar Aky\u00fcrek\"},\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/author\\\/hakyuerek\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Data Quality Made Easy with Soda - inovex GmbH","description":"Soda offers extensive options for ensuring data quality, both in terms of the range of checks and the supported backends. We talk about the details.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.inovex.de\/de\/blog\/data-quality-made-easy-with-soda\/","og_locale":"de_DE","og_type":"article","og_title":"Data Quality Made Easy with Soda - inovex GmbH","og_description":"Soda offers extensive options for ensuring data quality, both in terms of the range of checks and the supported backends. We talk about the details.","og_url":"https:\/\/www.inovex.de\/de\/blog\/data-quality-made-easy-with-soda\/","og_site_name":"inovex GmbH","article_publisher":"https:\/\/www.facebook.com\/inovexde","article_published_time":"2024-09-23T07:00:26+00:00","article_modified_time":"2026-02-18T09:25:08+00:00","og_image":[{"width":1500,"height":880,"url":"https:\/\/www.inovex.de\/wp-content\/uploads\/Data-Quality-made-easy-with-Soda.png","type":"image\/png"}],"author":"Haydar Aky\u00fcrek, Simon Bachstein, Hiroshi Hamano, Marcel Spitzer","twitter_card":"summary_large_image","twitter_image":"https:\/\/www.inovex.de\/wp-content\/uploads\/Data-Quality-made-easy-with-Soda-1024x601.png","twitter_creator":"@inovexgmbh","twitter_site":"@inovexgmbh","twitter_misc":{"Verfasst von":"Haydar Aky\u00fcrek","Gesch\u00e4tzte Lesezeit":"13\u00a0Minuten","Written by":"Haydar Aky\u00fcrek, Simon Bachstein, Hiroshi Hamano, Marcel Spitzer"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.inovex.de\/de\/blog\/data-quality-made-easy-with-soda\/#article","isPartOf":{"@id":"https:\/\/www.inovex.de\/de\/blog\/data-quality-made-easy-with-soda\/"},"author":{"name":"Haydar Aky\u00fcrek","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/cbd8ef525753381fe614c04b9fac5071"},"headline":"Data Quality Made Easy with Soda","datePublished":"2024-09-23T07:00:26+00:00","dateModified":"2026-02-18T09:25:08+00:00","mainEntityOfPage":{"@id":"https:\/\/www.inovex.de\/de\/blog\/data-quality-made-easy-with-soda\/"},"wordCount":2059,"commentCount":0,"publisher":{"@id":"https:\/\/www.inovex.de\/de\/#organization"},"image":{"@id":"https:\/\/www.inovex.de\/de\/blog\/data-quality-made-easy-with-soda\/#primaryimage"},"thumbnailUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/Data-Quality-made-easy-with-Soda.png","keywords":["Data Engineering","Data Mesh","Data Products","Data Science"],"inLanguage":"de","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.inovex.de\/de\/blog\/data-quality-made-easy-with-soda\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.inovex.de\/de\/blog\/data-quality-made-easy-with-soda\/","url":"https:\/\/www.inovex.de\/de\/blog\/data-quality-made-easy-with-soda\/","name":"Data Quality Made Easy with Soda - inovex GmbH","isPartOf":{"@id":"https:\/\/www.inovex.de\/de\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.inovex.de\/de\/blog\/data-quality-made-easy-with-soda\/#primaryimage"},"image":{"@id":"https:\/\/www.inovex.de\/de\/blog\/data-quality-made-easy-with-soda\/#primaryimage"},"thumbnailUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/Data-Quality-made-easy-with-Soda.png","datePublished":"2024-09-23T07:00:26+00:00","dateModified":"2026-02-18T09:25:08+00:00","description":"Soda offers extensive options for ensuring data quality, both in terms of the range of checks and the supported backends. We talk about the details.","breadcrumb":{"@id":"https:\/\/www.inovex.de\/de\/blog\/data-quality-made-easy-with-soda\/#breadcrumb"},"inLanguage":"de","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.inovex.de\/de\/blog\/data-quality-made-easy-with-soda\/"]}]},{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/blog\/data-quality-made-easy-with-soda\/#primaryimage","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/Data-Quality-made-easy-with-Soda.png","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/Data-Quality-made-easy-with-Soda.png","width":1500,"height":880,"caption":"Grafik: Mann sitzt auf gro\u00dfem Monitor mit Code. Ein weiterer Mann mit Lupe scannt den Code."},{"@type":"BreadcrumbList","@id":"https:\/\/www.inovex.de\/de\/blog\/data-quality-made-easy-with-soda\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.inovex.de\/de\/"},{"@type":"ListItem","position":2,"name":"Data Quality Made Easy with Soda"}]},{"@type":"WebSite","@id":"https:\/\/www.inovex.de\/de\/#website","url":"https:\/\/www.inovex.de\/de\/","name":"inovex GmbH","description":"","publisher":{"@id":"https:\/\/www.inovex.de\/de\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.inovex.de\/de\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"de"},{"@type":"Organization","@id":"https:\/\/www.inovex.de\/de\/#organization","name":"inovex GmbH","url":"https:\/\/www.inovex.de\/de\/","logo":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png","width":1921,"height":1081,"caption":"inovex GmbH"},"image":{"@id":"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/inovexde","https:\/\/x.com\/inovexgmbh","https:\/\/www.instagram.com\/inovexlife\/","https:\/\/www.linkedin.com\/company\/inovex","https:\/\/www.youtube.com\/channel\/UC7r66GT14hROB_RQsQBAQUQ"]},{"@type":"Person","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/cbd8ef525753381fe614c04b9fac5071","name":"Haydar Aky\u00fcrek","image":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/secure.gravatar.com\/avatar\/986cdab16b190513ff04cbc00254ee52242f52ba52e39f8bfb785ea4302e4715?s=96&d=retro&r=g7bcab65279fbe837c8efb7679b758a08","url":"https:\/\/secure.gravatar.com\/avatar\/986cdab16b190513ff04cbc00254ee52242f52ba52e39f8bfb785ea4302e4715?s=96&d=retro&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/986cdab16b190513ff04cbc00254ee52242f52ba52e39f8bfb785ea4302e4715?s=96&d=retro&r=g","caption":"Haydar Aky\u00fcrek"},"url":"https:\/\/www.inovex.de\/de\/blog\/author\/hakyuerek\/"}]}},"_links":{"self":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/57974","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/users\/354"}],"replies":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/comments?post=57974"}],"version-history":[{"count":5,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/57974\/revisions"}],"predecessor-version":[{"id":66238,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/57974\/revisions\/66238"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/media\/58250"}],"wp:attachment":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/media?parent=57974"}],"wp:term":[{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/tags?post=57974"},{"taxonomy":"service","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/service?post=57974"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/coauthors?post=57974"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}