{"id":64774,"date":"2026-05-22T12:52:11","date_gmt":"2026-05-22T10:52:11","guid":{"rendered":"https:\/\/www.inovex.de\/?p=64774"},"modified":"2026-05-22T18:11:25","modified_gmt":"2026-05-22T16:11:25","slug":"lightweight-data-quality-frameworks-dqx-for-apache-spark","status":"publish","type":"post","link":"https:\/\/www.inovex.de\/de\/blog\/lightweight-data-quality-frameworks-dqx-for-apache-spark\/","title":{"rendered":"Lightweight Data Quality Frameworks: DQX for Apache Spark"},"content":{"rendered":"<p><a href=\"https:\/\/www.inovex.de\/de\/blog\/ensuring-data-quality-a-data-engineers-perspective\/\">Data quality<\/a> (DQ) is a critical concern in today&#8217;s <a href=\"https:\/\/www.inovex.de\/de\/leistungen\/data-engineering\/\">data engineering<\/a><a href=\"https:\/\/www.inovex.de\/de\/blog\/ensuring-data-quality-a-data-engineers-perspective\/\">.<\/a> Poor data quality directly impacts the reliability of models, reports, and overall trust in data products (<a href=\"https:\/\/en.wikipedia.org\/wiki\/Garbage_in,_garbage_out\">garbage in, garbage out<\/a>). Consequently, the conversation shifts from the necessity of DQ to the specific frameworks required to implement it effectively.<\/p>\n<p>Many teams initially adopt established frameworks. While effective, these frameworks can present challenges including a potentially steep learning curve, considerable configuration overhead, and a business model that pushes you to buy the enterprise product.<\/p>\n<p>This post explores a lightweight alternative: DQX &#8211; A framework we used to implement a data quality monitor.<\/p>\n<hr \/>\n<h2>The Established Frameworks: Great Expectations &amp; Soda<\/h2>\n<p>Before exploring lightweight frameworks, let&#8217;s address the elephants in the room: <a href=\"https:\/\/greatexpectations.io\/\">Great Expectations<\/a> (GE) and <a href=\"https:\/\/www.inovex.de\/de\/blog\/data-quality-made-easy-with-soda\/\">Soda<\/a>. These are powerful, popular frameworks, but it&#8217;s important to understand their structure and the overhead they can introduce.<\/p>\n<p>A common point of confusion is the difference between their core open-source libraries and their enterprise platforms.<\/p>\n<ul>\n<li>Great Expectations Core vs. Great Expectations Cloud: GE Core is the open-source Python library for defining \u201cexpectations\u201c (data quality checks). Expectations are defined directly in the Python code and can be saved and read in as JSON. The paid GE Cloud SaaS is a fully hosted, collaborative platform that builds on this open-source core. It adds a UI to manage your expectations and enterprise features for data governance.<\/li>\n<li>Soda Core vs. Soda (Cloud\/Enterprise): This is a similar story. Soda Core is the open-source, command-line tool and Python library used to define and run data quality checks. In Soda, checks are defined in <a href=\"https:\/\/docs.soda.io\/sodacl-reference\/metrics-and-checks\">SodaCL<\/a>, a YAML-based soda-specific domain language. The paid Soda Cloud SaaS platform is the enterprise-grade product built on top. It provides a UI, dashboards, and advanced alerting, none of which are included in the open-source core.<\/li>\n<\/ul>\n<h4>Core vs. Enterprise<\/h4>\n<p>While the core libraries are free, using them effectively in production often requires significant custom code. This is most apparent with custom checks, which can be used to write additional checks that are not present in the pre-defined checks each framework comes with. Both Great Expectations (GX) and Soda allow for custom SQL. But they hit a wall regarding reusability in their open-source versions:<\/p>\n<p>In GX Core, developers must write <a href=\"https:\/\/docs.greatexpectations.io\/docs\/core\/customize_expectations\/use_sql_to_define_a_custom_expectation\">SQL queries<\/a> to implement their custom checks. Developers can&#8217;t reuse them with different parameters for other tables and columns. They will need to come up with their own solution. In the paid cloud version, there are some more features available: For example a <a href=\"https:\/\/docs.greatexpectations.io\/docs\/cloud\/expectations\/expectations_overview#row-conditions\">UI<\/a> to add some parameters for row filtering.<\/p>\n<p>In Soda Core, custom checks are also implemented as <a href=\"https:\/\/docs.soda.io\/soda-cl-overview\/custom-check-examples\">SQL queries<\/a> in SodaCL. To reuse them for other tables or with other parameters, there is a feature named <a href=\"https:\/\/docs.soda.io\/sodacl-reference\/check-template\">templates<\/a> available in the paid Soda version. This is somewhat counterintuitive, as Soda Core offers parameterizable pre-defined checks out of the box. Consequently, without the paid version, developers are forced to implement their own custom solution.<\/p>\n<p>Limitations like these can lead to teams writing a lot of custom code to work around the frameworks&#8216; limitations, sacrificing productivity to manage this custom-built overhead. The experience can feel like a constant, low-level pressure to upgrade to the paid SaaS platform rather than enabling effective use of the core library.<\/p>\n<hr \/>\n<h2>\ud83d\ude80 A Lightweight Champion for Databricks: DQX<\/h2>\n<p>For teams working within Databricks, there\u2019s an excellent alternative to the established frameworks: <a href=\"https:\/\/databrickslabs.github.io\/dqx\/\">DQX<\/a> by <a href=\"https:\/\/www.databricks.com\/learn\/labs\">Databricks Labs<\/a>. DQX is an open-source framework engineered specifically to handle data quality\u00a0within the Databricks environment effectively.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-66777\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/data_lifecycle.drawio-2.png\" alt=\"\" width=\"821\" height=\"211\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/data_lifecycle.drawio-2.png 821w, https:\/\/www.inovex.de\/wp-content\/uploads\/data_lifecycle.drawio-2-300x77.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/data_lifecycle.drawio-2-768x197.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/data_lifecycle.drawio-2-400x103.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/data_lifecycle.drawio-2-360x93.png 360w\" sizes=\"auto, (max-width: 821px) 100vw, 821px\" \/><\/p>\n<h6>Data Engineering Lifecycle with DQX (adapted from Reis &amp; Housley, 2022)<\/h6>\n<p>Unlike <a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/databricks\/ldp\/expectations\">Databricks Pipeline Expectations <\/a>which can be a great choice for pre-persistence checks in <a href=\"https:\/\/www.databricks.com\/product\/data-engineering\/spark-declarative-pipelines\">Spark Declarative Pipelines<\/a> \u2014 DQX can be used in the entire data engineering lifecycle. DQX can perform both pre- and post-persistence checks, enabling developers to implement a solution that truly fits their needs.<\/p>\n<p>The core design philosophy of DQX centers on overcoming common challenges associated with implementing data quality checks. DQX main advantages are the following:<\/p>\n<ul>\n<li>Easy integration and configuration: DQX is designed for a low-friction setup within Databricks. The goal is to get teams from installation to writing their first checks as quickly as possible, without a lengthy configuration cycle.<\/li>\n<li>Extensibility with <a href=\"https:\/\/databrickslabs.github.io\/dqx\/docs\/reference\/quality_checks\/#creating-custom-row-level-checks\">custom checks<\/a>: While pre-built checks are useful, every data has unique business logic. DQX allows teams to write their own custom data quality checks using PySpark functions. This is a significant advantage for data engineers and data scientists who are already comfortable in the Spark ecosystem, as it allows them to define complex, domain-specific validation logic without leaving their familiar environment.<\/li>\n<li>Automated rule suggestion: DQX includes a profiler. This feature analyzes a dataset to understand its characteristics (e.g., data types, value distributions, nullability). Based on this profile, it automatically suggests a baseline set of data quality checks, which teams can then accept, reject, or refine. And yes, you can also do this with <a href=\"https:\/\/databrickslabs.github.io\/dqx\/docs\/guide\/ai_assisted_quality_checks_generation\/\">AI<\/a>.<\/li>\n<li>Dashboards: DQX comes with a pre-configured <a href=\"https:\/\/databrickslabs.github.io\/dqx\/docs\/guide\/quality_dashboard\/\">Databricks Dashboard<\/a>. This is a great starting point for your data quality monitoring.<\/li>\n<li>Performance: Engineered specifically for Databricks, it integrates seamlessly with PySpark and the platform&#8217;s core features to ensure speed and efficiency.<\/li>\n<\/ul>\n<h2>\ud83d\udcbb Implementation<\/h2>\n<p><a href=\"https:\/\/databrickslabs.github.io\/dqx\/docs\/reference\/quality_checks\/#row-level-checks-reference\">Data quality checks<\/a> in DQX can be defined in various ways: as DQX classes in Python code, as delta table, or in a separate JSON or YAML file using a DQX-specific domain language. This example demonstrates the use of the domain specific language expressed in YAML syntax. The implemented check (<em>is_not_null_and_not_empty<\/em>) is a <a href=\"https:\/\/databrickslabs.github.io\/dqx\/docs\/guide\/quality_dashboard\/\">predefined checks<\/a> that checks column <em>city<\/em> for null or empty string values.<\/p>\n<pre class=\"lang:yaml decode:true\" title=\"checks.yml\">- criticality: error \r\n  check: \r\n    function: is_not_null_and_not_empty \r\n    arguments: \r\n      column: city<\/pre>\n<p>For a quick start, DQX can be installed in a notebook.<\/p>\n<pre class=\"lang:python decode:true\" title=\"execute_checks.ipynb\">%sh pip install databricks-labs-dqx \r\ndbutils.library.restartPython()\r\n<\/pre>\n<p>We demonstrate the check by loading the <em>samples.bakehouse.sales_customers <\/em>dataset and setting <em>city<\/em> for row with\u00a0<em>customerID=<\/em><em>2000259 <\/em>to<em> None<\/em>. After initializing the data, we apply the check defined in the <em>checks.yml<\/em> file to the DataFrame to identify the intentional error.<\/p>\n<pre class=\"lang:python decode:true\" title=\"execute_checks.ipynb\">from databricks.labs.dqx.engine import DQEngine\r\nfrom databricks.sdk import WorkspaceClient\r\n\r\ninput_df = spark.read.format(\"delta\").table(\"samples.bakehouse.sales_customers\")\r\ninput_df = input_df.withColumn(\"city\", when(col(\"customerID\") == 2000259, None).otherwise(col(\"city\")))\r\n\r\ndq_engine = DQEngine(WorkspaceClient())\r\nchecks = dq_engine.load_checks_from_workspace_file(workspace_path=\".\/checks.yml\")\r\nresult_df = dq_engine.apply_checks_by_metadata(input_df, checks)\r\nresult_df.write.<span class=\"hljs-built_in\">format<\/span>(<span class=\"hljs-string\">\"delta\"<\/span>).mode(<span class=\"hljs-string\">\"overwrite\"<\/span>).saveAsTable(\"unity_catalog_path\")\r\n<\/pre>\n<p>The resulting DataFrame <em>result_df<\/em> has two new columns: <em>_errors<\/em> and <em>_warnings<\/em>. As the level of the check is error, the <em>_errors<\/em> column in <em>result_df<\/em> for the row with customerID <em>2000259<\/em> contains a JSON with the error information.<\/p>\n<pre class=\"lang:default decode:true\" title=\"error.json\">[\r\n    {\r\n        \"name\": \"city_is_null_or_empty\",\r\n        \"message\": \"Column 'city' value is null or empty\",\r\n        \"columns\": [\r\n            \"city\"\r\n        ],\r\n        \"filter\": null,\r\n        \"function\": \"is_not_null_and_not_empty\",\r\n        \"run_time\": \"2025-07-09T08:56:11.080262Z\",\r\n        \"user_metadata\": {}\r\n    }\r\n]\r\n<\/pre>\n<p>This information can be used to filter your dataset and to build data quality monitoring based on passed and failed checks. One low-hanging option is to visualize the information in a prebuilt Databricks Dashboard that DQX comes with. It offers an immediate overview of data quality by charting failure percentages alongside a comprehensive table of failing checks.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-64869\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/dqx_dashboard_v2.png\" alt=\"\" width=\"1569\" height=\"453\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/dqx_dashboard_v2.png 1569w, https:\/\/www.inovex.de\/wp-content\/uploads\/dqx_dashboard_v2-300x87.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/dqx_dashboard_v2-1024x296.png 1024w, https:\/\/www.inovex.de\/wp-content\/uploads\/dqx_dashboard_v2-768x222.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/dqx_dashboard_v2-1536x443.png 1536w, https:\/\/www.inovex.de\/wp-content\/uploads\/dqx_dashboard_v2-400x115.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/dqx_dashboard_v2-360x104.png 360w\" sizes=\"auto, (max-width: 1569px) 100vw, 1569px\" \/><\/p>\n<p>DQX prioritizes performance by focusing primarily on row-level checks. This means some features found in established frameworks, like full-table <a href=\"https:\/\/docs.soda.io\/sodacl-reference\/distribution\">distribution checks<\/a>, aren&#8217;t part of its features.<\/p>\n<p>Examples for more complex <a href=\"https:\/\/databrickslabs.github.io\/dqx\/docs\/reference\/quality_checks\/#using-python-function\">custom checks<\/a>, applying the same check on <a href=\"https:\/\/databrickslabs.github.io\/dqx\/docs\/guide\/quality_checks_apply\/#applying-checks-defined-using-metadata\">multiple columns<\/a>, how to add more<a href=\"https:\/\/databrickslabs.github.io\/dqx\/docs\/guide\/additional_configuration\/#adding-user-metadata-to-the-results-of-specific-checks\"> metadata to checks<\/a>, and how to use the <a href=\"https:\/\/databrickslabs.github.io\/dqx\/docs\/reference\/profiler\/\">profiler<\/a> to generate rule recommendations automatically can be found in the <a href=\"https:\/\/databrickslabs.github.io\/dqx\/docs\/guide\/\">DQX documentation<\/a>. In the end, you probably want to persist your data quality runs and build filters and custom monitoring that fit your needs.<\/p>\n<hr \/>\n<h2>Final Thoughts: Choosing Your Data Quality Framework<\/h2>\n<p data-path-to-node=\"2,0\">Before committing to a costly enterprise platform or spending weeks customizing a big framework, step back and evaluate your team&#8217;s actual needs. Often, a fast, integrated solution is the better choice.<\/p>\n<p data-path-to-node=\"2,1\">For data quality in Databricks, DQX stands out by offering high performance with low overhead. It strips away the complexity of heavier frameworks, leaving you with a tool that is fast, flexible, and easy to manage.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Data quality (DQ) is a critical concern in today&#8217;s data engineering. Poor data quality directly impacts the reliability of models, reports, and overall trust in data products (garbage in, garbage out). Consequently, the conversation shifts from the necessity of DQ to the specific frameworks required to implement it effectively. Many teams initially adopt established frameworks. [&hellip;]<\/p>\n","protected":false},"author":396,"featured_media":67640,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"inline_featured_image":false,"ep_exclude_from_search":false,"footnotes":""},"tags":[77,385,179,784],"service":[411],"coauthors":[{"id":396,"display_name":"Joshua Finger","user_nicename":"jfinger"}],"class_list":["post-64774","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","tag-big-data","tag-data-engineering","tag-data-products","tag-databricks","service-data-engineering"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.6 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Lightweight Data Quality Frameworks: DQX for Apache Spark<\/title>\n<meta name=\"description\" content=\"This post explores how lightweight data quality frameworks offer an alternative to established tools like Soda and Great Expectations.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.inovex.de\/de\/blog\/lightweight-data-quality-frameworks-dqx-for-apache-spark\/\" \/>\n<meta property=\"og:locale\" content=\"de_DE\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Lightweight Data Quality Frameworks: DQX for Apache Spark\" \/>\n<meta property=\"og:description\" content=\"This post explores how lightweight data quality frameworks offer an alternative to established tools like Soda and Great Expectations.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.inovex.de\/de\/blog\/lightweight-data-quality-frameworks-dqx-for-apache-spark\/\" \/>\n<meta property=\"og:site_name\" content=\"inovex GmbH\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/inovexde\" \/>\n<meta property=\"article:published_time\" content=\"2026-05-22T10:52:11+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-05-22T16:11:25+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.inovex.de\/wp-content\/uploads\/light_framework.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1500\" \/>\n\t<meta property=\"og:image:height\" content=\"880\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Joshua Finger\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/www.inovex.de\/wp-content\/uploads\/light_framework-1024x601.png\" \/>\n<meta name=\"twitter:creator\" content=\"@inovexgmbh\" \/>\n<meta name=\"twitter:site\" content=\"@inovexgmbh\" \/>\n<meta name=\"twitter:label1\" content=\"Verfasst von\" \/>\n\t<meta name=\"twitter:data1\" content=\"Joshua Finger\" \/>\n\t<meta name=\"twitter:label2\" content=\"Gesch\u00e4tzte Lesezeit\" \/>\n\t<meta name=\"twitter:data2\" content=\"7\u00a0Minuten\" \/>\n\t<meta name=\"twitter:label3\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data3\" content=\"Joshua Finger\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/lightweight-data-quality-frameworks-dqx-for-apache-spark\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/lightweight-data-quality-frameworks-dqx-for-apache-spark\\\/\"},\"author\":{\"name\":\"Joshua Finger\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/person\\\/addeaf6bc1d22a095fe7abc1036b0388\"},\"headline\":\"Lightweight Data Quality Frameworks: DQX for Apache Spark\",\"datePublished\":\"2026-05-22T10:52:11+00:00\",\"dateModified\":\"2026-05-22T16:11:25+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/lightweight-data-quality-frameworks-dqx-for-apache-spark\\\/\"},\"wordCount\":1242,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/lightweight-data-quality-frameworks-dqx-for-apache-spark\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/light_framework.png\",\"keywords\":[\"Big Data\",\"Data Engineering\",\"Data Products\",\"Databricks\"],\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/lightweight-data-quality-frameworks-dqx-for-apache-spark\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/lightweight-data-quality-frameworks-dqx-for-apache-spark\\\/\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/lightweight-data-quality-frameworks-dqx-for-apache-spark\\\/\",\"name\":\"Lightweight Data Quality Frameworks: DQX for Apache Spark\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/lightweight-data-quality-frameworks-dqx-for-apache-spark\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/lightweight-data-quality-frameworks-dqx-for-apache-spark\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/light_framework.png\",\"datePublished\":\"2026-05-22T10:52:11+00:00\",\"dateModified\":\"2026-05-22T16:11:25+00:00\",\"description\":\"This post explores how lightweight data quality frameworks offer an alternative to established tools like Soda and Great Expectations.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/lightweight-data-quality-frameworks-dqx-for-apache-spark\\\/#breadcrumb\"},\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/lightweight-data-quality-frameworks-dqx-for-apache-spark\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/lightweight-data-quality-frameworks-dqx-for-apache-spark\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/light_framework.png\",\"contentUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/light_framework.png\",\"width\":1500,\"height\":880},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/lightweight-data-quality-frameworks-dqx-for-apache-spark\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Lightweight Data Quality Frameworks: DQX for Apache Spark\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#website\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\",\"name\":\"inovex GmbH\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"de\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\",\"name\":\"inovex GmbH\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/inovex-logo-16-9-1.png\",\"contentUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/inovex-logo-16-9-1.png\",\"width\":1921,\"height\":1081,\"caption\":\"inovex GmbH\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/inovexde\",\"https:\\\/\\\/x.com\\\/inovexgmbh\",\"https:\\\/\\\/www.instagram.com\\\/inovexlife\\\/\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/inovex\",\"https:\\\/\\\/www.youtube.com\\\/channel\\\/UC7r66GT14hROB_RQsQBAQUQ\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/person\\\/addeaf6bc1d22a095fe7abc1036b0388\",\"name\":\"Joshua Finger\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/cropped-cropped-profil_joshua-2-96x96.jpg78e34f5348e3c2385d71fbb2b18d9b2d\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/cropped-cropped-profil_joshua-2-96x96.jpg\",\"contentUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/cropped-cropped-profil_joshua-2-96x96.jpg\",\"caption\":\"Joshua Finger\"},\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/author\\\/jfinger\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Lightweight Data Quality Frameworks: DQX for Apache Spark","description":"This post explores how lightweight data quality frameworks offer an alternative to established tools like Soda and Great Expectations.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.inovex.de\/de\/blog\/lightweight-data-quality-frameworks-dqx-for-apache-spark\/","og_locale":"de_DE","og_type":"article","og_title":"Lightweight Data Quality Frameworks: DQX for Apache Spark","og_description":"This post explores how lightweight data quality frameworks offer an alternative to established tools like Soda and Great Expectations.","og_url":"https:\/\/www.inovex.de\/de\/blog\/lightweight-data-quality-frameworks-dqx-for-apache-spark\/","og_site_name":"inovex GmbH","article_publisher":"https:\/\/www.facebook.com\/inovexde","article_published_time":"2026-05-22T10:52:11+00:00","article_modified_time":"2026-05-22T16:11:25+00:00","og_image":[{"width":1500,"height":880,"url":"https:\/\/www.inovex.de\/wp-content\/uploads\/light_framework.png","type":"image\/png"}],"author":"Joshua Finger","twitter_card":"summary_large_image","twitter_image":"https:\/\/www.inovex.de\/wp-content\/uploads\/light_framework-1024x601.png","twitter_creator":"@inovexgmbh","twitter_site":"@inovexgmbh","twitter_misc":{"Verfasst von":"Joshua Finger","Gesch\u00e4tzte Lesezeit":"7\u00a0Minuten","Written by":"Joshua Finger"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.inovex.de\/de\/blog\/lightweight-data-quality-frameworks-dqx-for-apache-spark\/#article","isPartOf":{"@id":"https:\/\/www.inovex.de\/de\/blog\/lightweight-data-quality-frameworks-dqx-for-apache-spark\/"},"author":{"name":"Joshua Finger","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/addeaf6bc1d22a095fe7abc1036b0388"},"headline":"Lightweight Data Quality Frameworks: DQX for Apache Spark","datePublished":"2026-05-22T10:52:11+00:00","dateModified":"2026-05-22T16:11:25+00:00","mainEntityOfPage":{"@id":"https:\/\/www.inovex.de\/de\/blog\/lightweight-data-quality-frameworks-dqx-for-apache-spark\/"},"wordCount":1242,"commentCount":0,"publisher":{"@id":"https:\/\/www.inovex.de\/de\/#organization"},"image":{"@id":"https:\/\/www.inovex.de\/de\/blog\/lightweight-data-quality-frameworks-dqx-for-apache-spark\/#primaryimage"},"thumbnailUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/light_framework.png","keywords":["Big Data","Data Engineering","Data Products","Databricks"],"inLanguage":"de","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.inovex.de\/de\/blog\/lightweight-data-quality-frameworks-dqx-for-apache-spark\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.inovex.de\/de\/blog\/lightweight-data-quality-frameworks-dqx-for-apache-spark\/","url":"https:\/\/www.inovex.de\/de\/blog\/lightweight-data-quality-frameworks-dqx-for-apache-spark\/","name":"Lightweight Data Quality Frameworks: DQX for Apache Spark","isPartOf":{"@id":"https:\/\/www.inovex.de\/de\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.inovex.de\/de\/blog\/lightweight-data-quality-frameworks-dqx-for-apache-spark\/#primaryimage"},"image":{"@id":"https:\/\/www.inovex.de\/de\/blog\/lightweight-data-quality-frameworks-dqx-for-apache-spark\/#primaryimage"},"thumbnailUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/light_framework.png","datePublished":"2026-05-22T10:52:11+00:00","dateModified":"2026-05-22T16:11:25+00:00","description":"This post explores how lightweight data quality frameworks offer an alternative to established tools like Soda and Great Expectations.","breadcrumb":{"@id":"https:\/\/www.inovex.de\/de\/blog\/lightweight-data-quality-frameworks-dqx-for-apache-spark\/#breadcrumb"},"inLanguage":"de","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.inovex.de\/de\/blog\/lightweight-data-quality-frameworks-dqx-for-apache-spark\/"]}]},{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/blog\/lightweight-data-quality-frameworks-dqx-for-apache-spark\/#primaryimage","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/light_framework.png","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/light_framework.png","width":1500,"height":880},{"@type":"BreadcrumbList","@id":"https:\/\/www.inovex.de\/de\/blog\/lightweight-data-quality-frameworks-dqx-for-apache-spark\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.inovex.de\/de\/"},{"@type":"ListItem","position":2,"name":"Lightweight Data Quality Frameworks: DQX for Apache Spark"}]},{"@type":"WebSite","@id":"https:\/\/www.inovex.de\/de\/#website","url":"https:\/\/www.inovex.de\/de\/","name":"inovex GmbH","description":"","publisher":{"@id":"https:\/\/www.inovex.de\/de\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.inovex.de\/de\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"de"},{"@type":"Organization","@id":"https:\/\/www.inovex.de\/de\/#organization","name":"inovex GmbH","url":"https:\/\/www.inovex.de\/de\/","logo":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png","width":1921,"height":1081,"caption":"inovex GmbH"},"image":{"@id":"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/inovexde","https:\/\/x.com\/inovexgmbh","https:\/\/www.instagram.com\/inovexlife\/","https:\/\/www.linkedin.com\/company\/inovex","https:\/\/www.youtube.com\/channel\/UC7r66GT14hROB_RQsQBAQUQ"]},{"@type":"Person","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/addeaf6bc1d22a095fe7abc1036b0388","name":"Joshua Finger","image":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/wp-content\/uploads\/cropped-cropped-profil_joshua-2-96x96.jpg78e34f5348e3c2385d71fbb2b18d9b2d","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/cropped-cropped-profil_joshua-2-96x96.jpg","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/cropped-cropped-profil_joshua-2-96x96.jpg","caption":"Joshua Finger"},"url":"https:\/\/www.inovex.de\/de\/blog\/author\/jfinger\/"}]}},"_links":{"self":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/64774","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/users\/396"}],"replies":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/comments?post=64774"}],"version-history":[{"count":5,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/64774\/revisions"}],"predecessor-version":[{"id":67642,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/64774\/revisions\/67642"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/media\/67640"}],"wp:attachment":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/media?parent=64774"}],"wp:term":[{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/tags?post=64774"},{"taxonomy":"service","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/service?post=64774"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/coauthors?post=64774"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}