{"id":46916,"date":"2023-09-15T11:27:33","date_gmt":"2023-09-15T09:27:33","guid":{"rendered":"https:\/\/www.inovex.de\/?p=46916"},"modified":"2026-06-23T07:48:17","modified_gmt":"2026-06-23T05:48:17","slug":"data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools","status":"publish","type":"post","link":"https:\/\/www.inovex.de\/de\/blog\/data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools\/","title":{"rendered":"Data Observability is Key: A Hands-on Comparison of Open Source Data Catalog Tools"},"content":{"rendered":"<p><span style=\"font-weight: 400;\">Data observability is key to today&#8217;s business world when it comes to digitizing and automating processes and being a data-driven company. Data catalogs are the foundation when focusing on establishing and improving data observability in a company. In the following, we will compare three data catalog tools that are available as open source extensively. Thereby, we will not compare them via their shining feature descriptions on their homepage. Instead, we got our hands dirty and dug into them deeply. We actually used the tools, ingested data from sources, and discovered the functions ourselves.<\/span><br \/>\n<!--more--><\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_85 counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\"><p class=\"ez-toc-title\" style=\"cursor:inherit\"><\/p>\n<\/div><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools\/#What-is-a-Data-Catalog\" >What is a Data Catalog?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools\/#Comparison-Setup\" >Comparison Setup<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools\/#Catalog-Candidates\" >Catalog Candidates<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools\/#Amundsen\" >Amundsen<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools\/#OpenMetadata\" >OpenMetadata<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools\/#DataHub\" >DataHub<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools\/#Feature-Comparison\" >Feature Comparison<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools\/#Metadata-Ingestion\" >Metadata Ingestion<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools\/#Lineage-Details\" >Lineage Details<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools\/#ExploringNavigating-in-the-Catalog\" >Exploring\/Navigating in the Catalog<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-11\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools\/#Profiling-and-Metadata-Tests\" >Profiling and Metadata Tests<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-12\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools\/#Authorization\" >Authorization<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-13\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools\/#Upgrade-and-Deployment\" >Upgrade and Deployment<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-14\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools\/#Roadmap\" >Roadmap<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-15\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools\/#Summary\" >Summary<\/a><\/li><\/ul><\/nav><\/div>\n<h2><span class=\"ez-toc-section\" id=\"What-is-a-Data-Catalog\"><\/span>What is a Data Catalog?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">But as a first step, let\u2019s shed some light on: What is a data catalog? What can I expect? What are the advantages of starting to use such a tool?<\/span><\/p>\n<p><span style=\"font-weight: 400;\">A data catalog can be seen as a digital equivalent of a library. As a single source of truth, it contains all information about the company\u2019s data stock. Moreover, it helps to find the data already available for usage. A catalog provides an entry point and supports people interested in consuming new data in their selection, by including additional metadata about all data assets. Metadata here can be, for example, the storage location, data owners, if data is considered as personal identifiable information (PII), or information on data quality. In addition, additional features like a search interface or entity tagging can increase the usability for non-technical users. <\/span><span style=\"font-weight: 400;\">In either case, a data catalog serves as the first stop to find all data assets in your company. Assets can be of various types like tables, files, processes, metrics, etc.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">With all the features the whole company benefits from a unified overview of all data supporting the management of their data infrastructure like Data Lakes or Data Meshes. It addresses or supports the following topics:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Data Freshness: How up-to-date is the data we want to use?<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Data Security\/Governance: Who do I need to contact to get access to the data? Does the data contain sensitive information?<\/span><\/li>\n<\/ul>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Data Redundancy: Is the data we are looking for already provided by another team in the company and already available?<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Data Discrepancy: How can we integrate all of the data across our organization? How can we ensure that the same standards are consistently applied across the organization?<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Data Documentation: What does an attribute mean? What\u2019s the purpose of a dataset?<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Data Quality: Does the data have good quality? Which checks are carried out and according to which schedule? How does the data quality develop over time?<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Data Lineage: Who uses which data and will be affected by schema changes?<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">As the above arguments already make clear, it makes sense for many companies to start building their own catalog. But before we continue with our comparison, let&#8217;s take a quick look at the current trend of Data Mesh.<br \/>\n<\/span><span style=\"font-weight: 400;\">One of the core principles of Data Mesh is to move the responsibility away from dedicated data teams to domain or product teams. This means more decentralization and democratization of data ownership. To achieve this, it is essential to have an overview of what data is available and maintained by which team in order to coordinate data sharing, etc. Furthermore, this reduces the risk of duplicating efforts or slowing down processes. <\/span><span style=\"font-weight: 400;\">So the catalog as a major center piece allows everybody to scroll through the data which is available (probably limited by governance rules) without the need of technical access and possible governance violations. Without this, the Data Mesh would become a collection of data silos instead.<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Comparison-Setup\"><\/span>Comparison Setup<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">Before we introduce the tools we chose for this comparison, we would like to shed some light on the deployment setup for our comparison. In order to have a &#8222;production-like&#8220; environment, we decided not to use the local Docker-based quick-start setup. This is provided by many tools for demo and quick trial-and-error purposes. Instead, we want to deploy the catalog on Kubernetes. This is a setup we often see with our customers. To make things a little easier, we deploy the required backend databases as deployments with persistent volumes. However, this is not an option for a true production deployment. Instead, we should rather rely on a full database instance or managed databases from the cloud provider to simplify availability and backup.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In this comparison, we use Google&#8217;s Kubernetes Engine (GKE) on Google Cloud Platform (GCP) and BigQuery as an example of ingestion sources.<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Catalog-Candidates\"><\/span>Catalog Candidates<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">There are many commercially licensed catalogs, but we like the idea of open source tools. In particular, the flexibility in how you can integrate the tools into your existing environment. By using an open source data catalog, you can easily adapt it to your own needs (if required). In addition, being able to delve into the codebase can help you a lot during the initial deployment in case it does not work straight out of the box. Therefore, we limit this comparison to three open source catalogs which we see the most potential in.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">After some research, we selected the following:<\/span><\/p>\n<ol>\n<li style=\"list-style-type: none;\">\n<ol>\n<li style=\"font-weight: 400;\"><a href=\"https:\/\/datahubproject.io\/\" target=\"_blank\" rel=\"noopener\"><i><span style=\"font-weight: 400;\">DataHub<\/span><\/i><\/a><span style=\"font-weight: 400;\">, most prominent open source catalog, originally developed by LinkedIn<\/span><\/li>\n<li style=\"font-weight: 400;\"><a href=\"https:\/\/open-metadata.org\/\" target=\"_blank\" rel=\"noopener\"><i><span style=\"font-weight: 400;\">OpenMetadata<\/span><\/i><\/a><span style=\"font-weight: 400;\">, an up-and-coming catalog with a slightly different approach<\/span><\/li>\n<li><i><span style=\"font-weight: 400;\">Amundsen<\/span><\/i><span style=\"font-weight: 400;\">, a somewhat older and stable catalog solution originally developed by Lyft<\/span><\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<p><span style=\"font-weight: 400;\">In the following, we will introduce the three catalog tools in detail, talking specifically about their components, and how we decided to set them up.<\/span><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Amundsen\"><\/span>Amundsen<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-47384 alignleft\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/Amundsen-logo-300x90.png\" alt=\"\" width=\"300\" height=\"90\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/Amundsen-logo-300x90.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/Amundsen-logo-400x120.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/Amundsen-logo-360x108.png 360w, https:\/\/www.inovex.de\/wp-content\/uploads\/Amundsen-logo.png 409w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">Amundsen went open source in 2019 and was born out of a desire to master data discovery and data governance at <a href=\"https:\/\/www.lyft.com\/\" target=\"_blank\" rel=\"noopener\">Lyft<\/a><\/span><span style=\"font-weight: 400;\">. It is a well known and established data catalog tool that one should consider when investigating data catalogs. In terms of required components, Amundsen is the most simple tool in our comparison. It only requires two prerequisite services to be available: <a href=\"https:\/\/www.elastic.co\/de\/elasticsearch\/\" target=\"_blank\" rel=\"noopener\">Elasticsearch<\/a> as search engine and <a href=\"https:\/\/neo4j.com\/\" target=\"_blank\" rel=\"noopener\">Neo4J<\/a> as graph index. As an alternative, one can also use <a href=\"https:\/\/atlas.apache.org\" target=\"_blank\" rel=\"noopener\">Apache Atlas <\/a>instead of Neo4j.<br \/>\nWithin the scope of this evaluation, we decided to go with Neo4J. In the medium term, this setup may become simpler as Elasticsearch can also act as a graph index, but this is speculative at this stage. We are not aware of Amundsen considering this change.<\/span><\/p>\n<figure id=\"attachment_47311\" aria-describedby=\"caption-attachment-47311\" style=\"width: 2004px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-47311 \" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/amundsen-architecture.svg\" alt=\"architecture diagram of Amundsen\" width=\"2004\" height=\"1127\" \/><figcaption id=\"caption-attachment-47311\" class=\"wp-caption-text\">Architecture diagram of Amundsen, own depiction inspired by Amundsen\u2019s documentation<\/figcaption><\/figure>\n<p><span style=\"font-weight: 400;\">Building on this foundation, the backend of Amundsen consists of two services: the metadata and the search service. In contrast, the frontend consists only of one service. It communicates via REST APIs with the search service to provide features such as full-text search to end users. For this, the required technical metadata and information are retrieved from the metadata service, which uses the state stored in Neo4j. To ingest metadata, Amundsen provides the <a href=\"https:\/\/pypi.org\/project\/amundsen-databuilder\/\" target=\"_blank\" rel=\"noopener\">databuilder library<\/a><\/span><span style=\"font-weight: 400;\"> which allows users to extract metadata from various source systems and send it to Amundsen in the expected format.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Amundsen sounded really interesting at first sight. Unfortunately, as we got deeper into the research, without finding more recent resources or articles, it started to feel a bit old and stale. <\/span><span style=\"font-weight: 400;\">Nevertheless, we tried to deploy an instance in our Kubernetes cluster to play around and evaluate the system. Sadly, the public helm chart could not be used because there are a lot of outdated sources. It uses an older Neo4J version than the databuilder library expects, so it is <a href=\"https:\/\/github.com\/amundsen-io\/amundsen\/issues\/2103\" target=\"_blank\" rel=\"noopener\">not really usable<\/a><\/span><span style=\"font-weight: 400;\">. We tried to update the sources, but stopped after a while because the documentation was sparse. Furthermore, we did not want to invest too much time in one catalog when we have promising other catalogs in the pipeline.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In their Slack, we found a statement from one of the maintainers that Amundsen is more in a maintenance and stability mode. In addition, the maintainers (of Lyft) do not actively develop features for the OSS system and only support community contributions. With this in mind, we have decided not to consider Amundsen further (or without practical experience) in this comparison for the time being.<\/span><\/p>\n<h3><span class=\"ez-toc-section\" id=\"OpenMetadata\"><\/span>OpenMetadata<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-47382 alignleft\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/omd-logo.svg\" alt=\"\" width=\"300\" height=\"120\" \/><span style=\"font-weight: 400;\">OpenMetadata started open source in mid 2021 and wants to address the following topics: Discovery, Collaboration, Governance, Data Quality, and Data Insights. It defines itself as an active metadata platform rather than a simple data catalog. And that brings real benefits to the entire organization. <a href=\"https:\/\/www.getcollate.io\/\" target=\"_blank\" rel=\"noopener\">Collate<\/a><\/span><span style=\"font-weight: 400;\">, which also offers a SaaS solution of OpenMetadata, is the primary developer and maintainer of OpenMetadata.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">One special feature OpenMetadata offers is the ability to interact with other catalog users directly. The change feed on the starting page shows the latest changes made by whom. One can start conversions on changes\/metadata, request tags and terms to entities, and assign the task to other people in the company which are able\/allowed to verify it (see the figure below). Sloppy speaking, one could say that OpenMetadata is the social network among data catalog tools.<\/span><\/p>\n<figure id=\"attachment_47321\" aria-describedby=\"caption-attachment-47321\" style=\"width: 1680px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-47321 size-full\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/openmetadata-feed.png\" alt=\"\" width=\"1680\" height=\"860\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/openmetadata-feed.png 1680w, https:\/\/www.inovex.de\/wp-content\/uploads\/openmetadata-feed-300x154.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/openmetadata-feed-1024x524.png 1024w, https:\/\/www.inovex.de\/wp-content\/uploads\/openmetadata-feed-768x393.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/openmetadata-feed-1536x786.png 1536w, https:\/\/www.inovex.de\/wp-content\/uploads\/openmetadata-feed-400x205.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/openmetadata-feed-360x184.png 360w\" sizes=\"auto, (max-width: 1680px) 100vw, 1680px\" \/><figcaption id=\"caption-attachment-47321\" class=\"wp-caption-text\">Screenshot of OpenMetadata demonstrating a user\u2019s activity feed<\/figcaption><\/figure>\n<p><span style=\"font-weight: 400;\">In terms of required components, OpenMetadata is slightly more demanding than Amundsen. However, it has a very simple and lean architecture itself. In addition to an instance of Elasticsearch (alternatively you can also use <a href=\"https:\/\/opensearch.org\/\" target=\"_blank\" rel=\"noopener\">OpenSearch<\/a>) as the search engine and graph index, and a <a href=\"https:\/\/www.postgresql.org\/\" target=\"_blank\" rel=\"noopener\">Postgres<\/a> or <a href=\"https:\/\/www.mysql.com\" target=\"_blank\" rel=\"noopener\">MySQL<\/a> database instance, it requires an instance of <a href=\"https:\/\/airflow.apache.org\/\" target=\"_blank\" rel=\"noopener\">Apache Airflow<\/a> to orchestrate the metadata ingestion. Albeit, OpenMetadata does only recommend using Airflow, you can also use an orchestrator of your choice. If you are interested in an in-depth comparison of modern orchestrating tools, have a look at our blogpost <a href=\"https:\/\/www.inovex.de\/de\/blog\/data-orchestration-is-airflow-still-the-best-part-1\/\" target=\"_blank\" rel=\"noopener\">\u201cData Orchestration: Is Airflow Still the Best?\u201c<\/a>.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">OpenMetadata itself has a rather simple architecture, there are only two services required: <em>openmetadata-ingestion<\/em> and <em>openmetadata-server<\/em>. The first is responsible for ingesting metadata from various sources. This is orchestrated by the Airflow component. Therefore, it uses OpenMetadata\u2019s ingestion library (<a href=\"https:\/\/pypi.org\/project\/openmetadata-ingestion\/\" target=\"_blank\" rel=\"noopener\">openmetadata-ingestion<\/a><\/span><span style=\"font-weight: 400;\">). This library can also be used if metadata ingestion is not to be done in the UI. The latter service contains both the actual backend of OpenMetadata and the corresponding frontend. All communication within OpenMetadata happens through REST APIs. Please see the figure below to get an overview of all required components and how their interact.<\/span><\/p>\n<figure id=\"attachment_47334\" aria-describedby=\"caption-attachment-47334\" style=\"width: 1629px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.inovex.de\/wp-content\/uploads\/openmetadata-architecture.svg\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-47334 \" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/openmetadata-architecture.svg\" alt=\"\" width=\"1629\" height=\"917\" \/><\/a><figcaption id=\"caption-attachment-47334\" class=\"wp-caption-text\">Architecture diagram of OpenMetadata, own depiction inspired by Open Metadata&#8217;s documentation<\/figcaption><\/figure>\n<p><span style=\"font-weight: 400;\">For our OpenMetadata deployment, we went with the default component choices. We deployed Airflow as an orchestrator, Elasticsearch as search engine and graph index, and MySQL as database. Using OpenMetadata&#8217;s official <a href=\"https:\/\/github.com\/open-metadata\/openmetadata-helm-charts\" target=\"_blank\" rel=\"noopener\">helm chart<\/a><\/span><span style=\"font-weight: 400;\"> and their instructions for deploying to GKE<\/span><span style=\"font-weight: 400;\">, the deployment was fairly straightforward and worked out of the box like a charm.<\/span><\/p>\n<h3><span class=\"ez-toc-section\" id=\"DataHub\"><\/span>DataHub<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h3><img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-47379 size-medium\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/DataHub-Logo-300x87.png\" alt=\"\" width=\"300\" height=\"87\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/DataHub-Logo-300x87.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/DataHub-Logo-400x117.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/DataHub-Logo-360x105.png 360w, https:\/\/www.inovex.de\/wp-content\/uploads\/DataHub-Logo.png 415w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/h3>\n<p><span style=\"font-weight: 400;\">DataHub is an event-based data catalog and &#8211; based on its range of features &#8211; can be considered a metadata platform analogous to OpenMetadata. It was originally developed and used internally by <a href=\"https:\/\/about.linkedin.com\/\" target=\"_blank\" rel=\"noopener\">LinkedIn<\/a>. In early 2020, they decided to release it as open source. Since then the adoption and community around it grew rapidly. Today, DataHub is mainly developed and maintained by <a href=\"https:\/\/www.acryldata.io\/\" target=\"_blank\" rel=\"noopener\">Acryl.<\/a><\/span><span style=\"font-weight: 400;\"> They also have a SaaS offering for DataHub on their product line. Nevertheless, Acryl is strongly committed to the open source model and promises to remain \u201ctruly open source\u201c. This means that the vast majority of features (if not all) are and will be part of the open source distribution. However, they may not be at the same level of maturity or sophistication. In fact, this is then left to the open source community.<\/span><\/p>\n<figure id=\"attachment_47338\" aria-describedby=\"caption-attachment-47338\" style=\"width: 1535px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.inovex.de\/wp-content\/uploads\/datahub-architecture.svg\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-47338\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/datahub-architecture.svg\" alt=\"\" width=\"1535\" height=\"863\" \/><\/a><figcaption id=\"caption-attachment-47338\" class=\"wp-caption-text\">Architecture diagram of DataHub, own depiction inspired by DataHub&#8217;s documentation<\/figcaption><\/figure>\n<p><span style=\"font-weight: 400;\">Looking at the architecture, DataHub can be considered for sure as the most complex of all candidates in this comparison. The following components are required as prerequisites for DataHub:<\/span><\/p>\n<p style=\"padding-left: 40px;\"><span style=\"font-weight: 400;\">a) relational database to store metadata, which serves as source of truth of all information in DataHub. Officially supported are MySQL, Postgres, and <a href=\"https:\/\/mariadb.org\/\" target=\"_blank\" rel=\"noopener\">MariaDB<\/a>.<br \/>\n<\/span><span style=\"font-weight: 400;\">b) Elasticsearch as search engine<br \/>\n<\/span><span style=\"font-weight: 400;\">c) a graph index which can be realized by using again Elasticsearch or going with Neo4J<br \/>\n<\/span><span style=\"font-weight: 400;\">d) a message broker that fosters the event-based communication between DataHub\u2019s internal components. <a href=\"https:\/\/kafka.apache.org\/\" target=\"_blank\" rel=\"noopener\">Apache Kafka<\/a> is the default choice here.<br \/>\n<\/span><span style=\"font-weight: 400;\">e) until version 0.10.3 a schema registry has been required as well but this is now obsolete<\/span><\/p>\n<p><span style=\"font-weight: 400;\">DataHub itself constitutes at least two different services: <\/span><span style=\"font-weight: 400;\"><br \/>\n<\/span><span style=\"font-weight: 400;\">a) the backend itself. Due to the event-based nature of DataHub, any interaction in the user interface that affects metadata, or the ingestion of metadata, creates an event in a Kafka topic. That event is picked up by the backend service to update the database. This functionality can be outsourced to two additional services that can be managed individually: the Metadata Change Event (MCE) consumer service and the Metadata Audit Event (MAE) consumer service. The ingestion of metadata takes place either in a dedicated container, if it has been configured and started in the frontend. Alternatively, one can ingest metadata programmatically by using the <a href=\"https:\/\/pypi.org\/project\/acryl-datahub\/\" target=\"_blank\" rel=\"noopener\">Python SDK<\/a><\/span><span style=\"font-weight: 400;\">. <\/span><span style=\"font-weight: 400;\"><br \/>\n<\/span><span style=\"font-weight: 400;\">b) the frontend. The frontend communicates with the backend via its GraphQL interface. Thus, the user can search for metadata, add tags, or modify the metadata, which is then communicated to the backend through Kafka.<\/span><a href=\"https:\/\/www.inovex.de\/wp-content\/uploads\/datahub-architecture.svg\"><br \/>\n<\/a><\/p>\n<p><span style=\"font-weight: 400;\">Of course, this architecture brings more complexity into play than the other tools, which means more effort to implement and maintain. Nonetheless, this is also a differentiator, as it increases flexibility. It allows DataHub to be scaled more individually, and provides a lot of powerful tools to build custom use cases on top of it.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Using <a href=\"https:\/\/github.com\/acryldata\/datahub-helm\" target=\"_blank\" rel=\"noopener\">DataHub&#8217;s helm charts<\/a><\/span><span style=\"font-weight: 400;\"> and keeping the recommended default setup (MySQL as database, dedicated Kafka instance, Elasticsearch for search &amp; graph index, MCE &amp; MAE in the backend), deploying DataHub was quite easy for us and worked immediately.<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Feature-Comparison\"><\/span><strong>Feature Comparison<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">With the above overview of the various candidates and their architectures, it is now time to do a detailed comparison between them. First, we begin with a tabular overview of some hard and some easy-to-grab facts. Second, we present a detailed feature comparison based on our own sandbox deployments<\/span><span style=\"font-weight: 400;\">. In this section we refrain from including Amundsen, as we don\u2019t feel comfortable to judge without being able to speak from our personal experience.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-48463\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/Blogpost-Tabelle-4.png\" alt=\"Table of Comparison between DataHub and OpenMetadata\" width=\"934\" height=\"1954\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/Blogpost-Tabelle-4.png 934w, https:\/\/www.inovex.de\/wp-content\/uploads\/Blogpost-Tabelle-4-143x300.png 143w, https:\/\/www.inovex.de\/wp-content\/uploads\/Blogpost-Tabelle-4-489x1024.png 489w, https:\/\/www.inovex.de\/wp-content\/uploads\/Blogpost-Tabelle-4-768x1607.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/Blogpost-Tabelle-4-734x1536.png 734w, https:\/\/www.inovex.de\/wp-content\/uploads\/Blogpost-Tabelle-4-400x837.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/Blogpost-Tabelle-4-360x753.png 360w\" sizes=\"auto, (max-width: 934px) 100vw, 934px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">That&#8217;s about our general impression of both tools, let\u2019s now come to the most interesting part. In the following, we will contrast them in detail alongside some carefully selected aspects which we assess as essential for a data catalog tool<\/span><span style=\"font-weight: 400;\">. This comparison is entirely based on our sandbox environment that we created for this blogpost.<\/span><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Metadata-Ingestion\"><\/span><strong>Metadata Ingestion<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">In OpenMetadata the metadata ingestion configuration can happen (and we guess it&#8217;s the preferred way) via the UI. Alternatively, it can be done by writing down the configuration in a YAML file and ingesting metadata via the CLI tool or SDK from locally\/external systems. <\/span><span style=\"font-weight: 400;\">The setup via the UI feels really easy and is combined with an interactive and high-quality documentation at the side that really stands out.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-47349 aligncenter\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/ingestion-docu.png\" alt=\"\" width=\"2496\" height=\"932\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/ingestion-docu.png 2496w, https:\/\/www.inovex.de\/wp-content\/uploads\/ingestion-docu-300x112.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/ingestion-docu-1024x382.png 1024w, https:\/\/www.inovex.de\/wp-content\/uploads\/ingestion-docu-768x287.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/ingestion-docu-1536x574.png 1536w, https:\/\/www.inovex.de\/wp-content\/uploads\/ingestion-docu-2048x765.png 2048w, https:\/\/www.inovex.de\/wp-content\/uploads\/ingestion-docu-1920x717.png 1920w, https:\/\/www.inovex.de\/wp-content\/uploads\/ingestion-docu-400x149.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/ingestion-docu-360x134.png 360w\" sizes=\"auto, (max-width: 2496px) 100vw, 2496px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">Additionally, the ingestion can be scheduled by a (internal) deployment to the Airflow instance. You then get the state of the runs directly in the UI and can see logs etc from there. OpenMetadata has different ingestion types, the first one is always the ingestion of metadata itself. Based on this one can additionally configure additional ingestions for these sources, e.g., lineage information or data profiling.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-47351 aligncenter\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/OpenMetadata_ingestion.png\" alt=\"\" width=\"1981\" height=\"1223\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/OpenMetadata_ingestion.png 1981w, https:\/\/www.inovex.de\/wp-content\/uploads\/OpenMetadata_ingestion-300x185.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/OpenMetadata_ingestion-1024x632.png 1024w, https:\/\/www.inovex.de\/wp-content\/uploads\/OpenMetadata_ingestion-768x474.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/OpenMetadata_ingestion-1536x948.png 1536w, https:\/\/www.inovex.de\/wp-content\/uploads\/OpenMetadata_ingestion-1920x1185.png 1920w, https:\/\/www.inovex.de\/wp-content\/uploads\/OpenMetadata_ingestion-400x247.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/OpenMetadata_ingestion-360x222.png 360w\" sizes=\"auto, (max-width: 1981px) 100vw, 1981px\" \/><\/p>\n<p><span style=\"font-weight: 400;\"><br \/>\n<\/span><span style=\"font-weight: 400;\">The options DataHub offers are quite similar. It allows you to define the ingestion via the UI. The ingestion is then internally scheduled by a cron mechanism and executed in a separate container. Alternatively, one can define YAML recipes and trigger the ingestion from any supported system by using the CLI or SDK. All ingestion options, even lineage and profiling, are defined and configured in one place and not separately.<br \/>\n<\/span><span style=\"font-weight: 400;\">In contrast to OpenMetadata, the configuration in the UI is not that ideal. A lot of config options seem to be missing and API\/config changes are not reflected or documented enough. So you need to open the documentation in parallel or best stay with the YAML config which is also possible in the UI and works better. Regardless, you can view and jump directly into the assets created by each ingest run.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-47353 aligncenter\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/DataHub_ingestion_page.png\" alt=\"\" width=\"2548\" height=\"647\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/DataHub_ingestion_page.png 2548w, https:\/\/www.inovex.de\/wp-content\/uploads\/DataHub_ingestion_page-300x76.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/DataHub_ingestion_page-1024x260.png 1024w, https:\/\/www.inovex.de\/wp-content\/uploads\/DataHub_ingestion_page-768x195.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/DataHub_ingestion_page-1536x390.png 1536w, https:\/\/www.inovex.de\/wp-content\/uploads\/DataHub_ingestion_page-2048x520.png 2048w, https:\/\/www.inovex.de\/wp-content\/uploads\/DataHub_ingestion_page-1920x488.png 1920w, https:\/\/www.inovex.de\/wp-content\/uploads\/DataHub_ingestion_page-400x102.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/DataHub_ingestion_page-360x91.png 360w\" sizes=\"auto, (max-width: 2548px) 100vw, 2548px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">Be aware that both tools have connectors with a higher or lower level of maturity. So depending on your tech stack, one catalog might be preferable to the other just because it better supports your existing system landscape.<\/span><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Lineage-Details\"><\/span><strong>Lineage Details<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">OpenMetadata offers out of the box table and column level lineage for views with a nice visualization in the UI. The required data is extracted by parsing the SQL statement of the underlying asset. It also recognizes renaming of columns without issues.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-47366 size-full\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/OpenMetadata_lineage.png\" alt=\"\" width=\"1981\" height=\"1223\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/OpenMetadata_lineage.png 1981w, https:\/\/www.inovex.de\/wp-content\/uploads\/OpenMetadata_lineage-300x185.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/OpenMetadata_lineage-1024x632.png 1024w, https:\/\/www.inovex.de\/wp-content\/uploads\/OpenMetadata_lineage-768x474.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/OpenMetadata_lineage-1536x948.png 1536w, https:\/\/www.inovex.de\/wp-content\/uploads\/OpenMetadata_lineage-1920x1185.png 1920w, https:\/\/www.inovex.de\/wp-content\/uploads\/OpenMetadata_lineage-400x247.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/OpenMetadata_lineage-360x222.png 360w\" sizes=\"auto, (max-width: 1981px) 100vw, 1981px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">DataHub on the other side offers table lineage extraction by using a SQL parser, DDL statement or (in case of BigQuery) by reading job logs. This worked in our case also flawlessly.<br \/>\n<\/span><span style=\"font-weight: 400;\">At evaluation time DataHub didn&#8217;t support column level lineage for BigQuery, but it should be available in the current release.<\/span><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-47368\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/DataHub_lineage.png\" alt=\"\" width=\"1297\" height=\"930\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/DataHub_lineage.png 1297w, https:\/\/www.inovex.de\/wp-content\/uploads\/DataHub_lineage-300x215.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/DataHub_lineage-1024x734.png 1024w, https:\/\/www.inovex.de\/wp-content\/uploads\/DataHub_lineage-768x551.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/DataHub_lineage-400x287.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/DataHub_lineage-360x258.png 360w\" sizes=\"auto, (max-width: 1297px) 100vw, 1297px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">Both tools allow you to manually modify the lineage in the web UI. This can be beneficial for lineage that is not directly technically visible, but needs to be maintained for notifications. Moreover, DataHub even allows for file-based lineage definition and programmatic ingestion of lineage information.<\/span><\/p>\n<h3><span class=\"ez-toc-section\" id=\"ExploringNavigating-in-the-Catalog\"><\/span><strong>Exploring\/Navigating in the Catalog<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>At first glance, OpenMetadata&#8217;s Explore section seems a bit unstructured, as it doesn&#8217;t have a natural hierarchical structure. However, assets are easy to find thanks to a variety of filtering options and a reliable search.<\/p>\n<p><span style=\"font-weight: 400;\">DataHub structured their explore navigation a bit differently, e.g. for BigQuery tables the navigation is based on <em>environment<\/em> -&gt; <em>GCP project<\/em> -&gt; <em>dataset<\/em> -&gt; <em>table<\/em>. So it&#8217;s more a technical way of navigating through the catalog which could be counterintuitive for non-techies. This has been the default navigation style until <em>0.10.5<\/em>. From then on, DataHub launched a new browse and search page. This is now the default page of DataHub, but the old one is still available.<br \/>\n<\/span><\/p>\n<figure id=\"attachment_47370\" aria-describedby=\"caption-attachment-47370\" style=\"width: 2066px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-47370 size-full\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/DataHub_explore_navigation.png\" alt=\"\" width=\"2066\" height=\"881\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/DataHub_explore_navigation.png 2066w, https:\/\/www.inovex.de\/wp-content\/uploads\/DataHub_explore_navigation-300x128.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/DataHub_explore_navigation-1024x437.png 1024w, https:\/\/www.inovex.de\/wp-content\/uploads\/DataHub_explore_navigation-768x327.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/DataHub_explore_navigation-1536x655.png 1536w, https:\/\/www.inovex.de\/wp-content\/uploads\/DataHub_explore_navigation-2048x873.png 2048w, https:\/\/www.inovex.de\/wp-content\/uploads\/DataHub_explore_navigation-1920x819.png 1920w, https:\/\/www.inovex.de\/wp-content\/uploads\/DataHub_explore_navigation-400x171.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/DataHub_explore_navigation-360x154.png 360w\" sizes=\"auto, (max-width: 2066px) 100vw, 2066px\" \/><figcaption id=\"caption-attachment-47370\" class=\"wp-caption-text\">The new search and browse view of DataHub<\/figcaption><\/figure>\n<h3><span class=\"ez-toc-section\" id=\"Profiling-and-Metadata-Tests\"><\/span><strong>Profiling and Metadata Tests<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">In OpenMetadata profiling can easily be added via the UI. It provides convenient parameters like either the row count or percentage of the table to use for profiling. In addition, there is an automatic PII tagging functionality which can be applied to columns individually. Test cases for some columns can be added per table directly in the profiling tab. Once all cases are configured, a simple click on <em>Create Test Suite<\/em> will create a job. With <em>deploy<\/em>, <em>schedule<\/em>, or <em>run<\/em> action the test is deployed and executed as an additional Airflow DAG. Upon completion, you can then view the results directly in the data asset. It is also possible to define the profiling or tests via YAML config and submit them via the CLI tool or the SDK. This way, you can generate them from personalized configs if you want to apply the same settings to multiple tables.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In case a metadata test fails lately it gets marked as <\/span><i><span style=\"font-weight: 400;\">new <\/span><\/i><span style=\"font-weight: 400;\">and a user can <\/span><i><span style=\"font-weight: 400;\">acknowledge<\/span><\/i><span style=\"font-weight: 400;\"> the failure. Later, after investigation, one can <\/span><i><span style=\"font-weight: 400;\">resolve<\/span><\/i><span style=\"font-weight: 400;\"> it with a comment about what the root cause is. This will keep all discussions and resolutions in one place and visible to all interested persons. In the next (v1.2) release, a new page for browsing the history of issues will be added.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-47372\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/OpenMetadata_profiling.png\" alt=\"\" width=\"2557\" height=\"1223\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/OpenMetadata_profiling.png 2557w, https:\/\/www.inovex.de\/wp-content\/uploads\/OpenMetadata_profiling-300x143.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/OpenMetadata_profiling-1024x490.png 1024w, https:\/\/www.inovex.de\/wp-content\/uploads\/OpenMetadata_profiling-768x367.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/OpenMetadata_profiling-1536x735.png 1536w, https:\/\/www.inovex.de\/wp-content\/uploads\/OpenMetadata_profiling-2048x980.png 2048w, https:\/\/www.inovex.de\/wp-content\/uploads\/OpenMetadata_profiling-1920x918.png 1920w, https:\/\/www.inovex.de\/wp-content\/uploads\/OpenMetadata_profiling-400x191.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/OpenMetadata_profiling-360x172.png 360w\" sizes=\"auto, (max-width: 2557px) 100vw, 2557px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">In DataHub, profiling is part of the metadata ingestion recipes. This means that with each ingestion run, jobs are run to profile the tables. As this could be costly, they have built in functionality to reduce the amount of data to be processed. For example, by only reading the latest partition or only if the table has changed since the last run. Additionally, the metrics to be calculated can be configured per recipe, so either for one table or all in the datasets\/project. This can be both an advantage and a disadvantage depending on the number of tables and profiling requirements.<br \/>\n<\/span><span style=\"font-weight: 400;\">DataHub has also built in testing by supporting <a href=\"https:\/\/greatexpectations.io\/\" target=\"_blank\" rel=\"noopener\">Great Expectations<\/a> but the tests aren\u2019t managed by DataHub itself. Instead, one needs to run the test suite externally and configure Great Expectations accordingly. The results are then automatically communicated to DataHub through the REST API. For explicit metadata tests they provide an interface but from the UI it\u2019s only usable in the SaaS solution. It allows automated asset classification and governance monitoring, like checking if a data asset is properly tagged, has a user or description, and things like that.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-47374\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/DataHub_profiling.png\" alt=\"\" width=\"1924\" height=\"763\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/DataHub_profiling.png 1924w, https:\/\/www.inovex.de\/wp-content\/uploads\/DataHub_profiling-300x119.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/DataHub_profiling-1024x406.png 1024w, https:\/\/www.inovex.de\/wp-content\/uploads\/DataHub_profiling-768x305.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/DataHub_profiling-1536x609.png 1536w, https:\/\/www.inovex.de\/wp-content\/uploads\/DataHub_profiling-1920x761.png 1920w, https:\/\/www.inovex.de\/wp-content\/uploads\/DataHub_profiling-400x159.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/DataHub_profiling-360x143.png 360w\" sizes=\"auto, (max-width: 1924px) 100vw, 1924px\" \/><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Authorization\"><\/span><strong>Authorization<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">One of the main principles of a data catalog is to allow people to discover data assets. Though, organizations often want to control access at the individual or team level. Therefore, authorization features often play an important role when introducing a new tool. In general, both DataHub and OpenMetadata allow for authorization management via roles and policies.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In OpenMetadata, you can assign roles to users and teams to control who can do what with the entities. Teams can be managed from within OpenMetadata. Unfortunately, it&#8217;s not yet supported to take recourse to groups defined in the authentication provider. A team can be one of the type <em>Organization<\/em>, <em>Business Unit<\/em>, <em>Division<\/em>, <em>Department <\/em>or <em>Group<\/em>\u00a0to try to reflect your hierarchical structure in the organization.<br \/>\n<\/span><span style=\"font-weight: 400;\">A role consists of one to several policies, and policies consist of multiple rules (e.g., <em>Glossary<\/em>, <em>Add Owner<\/em>). It defines which operations the actors are allowed to perform on which resources based on optional conditions such as <em>isOwner<\/em> or similar.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Comparatively, DataHub provides the ability to use users and groups (e.g. ingested via Azure AD or created in DataHub) to manage asset access and permissions. They suggest using roles and applying them to users, but for advanced use cases you can also directly create policies that are used by the roles under the hood. Policies are separated into platform permissions, like managing users, and metadata permission, like adding tags or terms to entities.\u00a0<\/span><span style=\"font-weight: 400;\">While platform policies just give privileges to actors (users, groups), metadata policies define privileges to actors (users, groups or owners) for specific resources only. This allows fine grained access controls based on teams, domains or other levels which are relevant to you.<\/span><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Upgrade-and-Deployment\"><\/span><strong>Upgrade and Deployment<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">For upgrades of the Helm chart, OpenMetadata provides detailed steps<\/span><span style=\"font-weight: 400;\"> on how to migrate to the next version. They include the backup steps in case of a rollback and necessary manual steps to migrate schemas. In contrast, DataHub Helm charts are equipped with four Kubernetes jobs for setting up and migrating Kafka, Elasticsearch, SQL DB and the system itself which does everything for you.<\/span><\/p>\n<p>So for the happy path and without a dedicated platform team DataHub is the more comfortable way for upgrades and deployments but if something does not work out of the box (and we faced it in production already). In contrast, OpenMetadata seems to have the better documentation and detail level to identify what changed and how to rollback or fix it. However, with the release of version <em>1.1.1<\/em> introduced a migration job that performs some migration tasks during the deployment. So let\u2019s see whether our impression holds true for future releases as well.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Roadmap\"><\/span><strong>Roadmap<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Both catalogs provide a detailed roadmap at least for the next major release which sounds promising and proves an actively developed project.<br \/>\n<\/span><span style=\"font-weight: 400;\">Nevertheless, while DataHub is mostly focused on enhancing the OSS version and (so far) only developing minor SaaS features on the OpenMetadata side, it looks quite the opposite. <\/span><span style=\"font-weight: 400;\">Certainly a lot of OSS features will be added, but also some of the interesting features will only be available in the SaaS offering and it looks like the same for the next releases. <\/span><span style=\"font-weight: 400;\">So maybe they will focus on getting people on the platform in the future, but that&#8217;s just a feeling from our side. They may also support a lot of community contributions like DataHub does so that it remains valuable as an OSS variant as well.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">References:<\/span><br \/>\n<span style=\"font-weight: 400;\">OpenMetadata OSS roadmap<\/span><br \/>\n<a href=\"https:\/\/feature-requests.datahubproject.io\/roadmap\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">DataHub OSS roadmap<\/span><\/a><\/p>\n<p><span style=\"font-weight: 400;\">In summary, we have also condensed our extensive comparison into a visual representation. Please bear in mind that we only want to highlight a slight preference for one of the tools when one bar is out. We are not suggesting that one tool is completely superior to the other. Both do a really good job in all of these categories, as you should have seen in our comparison above.<\/span><\/p>\n<p><a href=\"https:\/\/www.inovex.de\/wp-content\/uploads\/comparison.svg\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-47376\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/comparison.svg\" alt=\"\" \/><\/a><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Summary\"><\/span><strong>Summary<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">In our case, Amundsen didn&#8217;t make the cut. Thus, we were left with OpenMetadata and DataHub, but even with that narrow selection, it&#8217;s not easy.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Both authors already have a lot of practical experience with DataHub since it was released as a small project in 2021. We like the simplicity and extensibility of the project to adapt it to our needs and processes. In addition, DataHub really feeds our engineering mindset as a lot of features allow a lot of customization for advanced usage.<br \/>\n<\/span><span style=\"font-weight: 400;\">But the capabilities of OpenMetadata, how quickly they&#8217;ve gotten to a state where it&#8217;s a real competitor to DataHub, and the look and feel make it a great candidate. As already mentioned, the interaction-based design is a key differentiator. In addition, governance-related features are better supported. Requesting changes to data assets, assigning tasks, or approval workflows are natively supported, to name a few. In general, Open Metadata feels cleaner and more stable than DataHub, and we would give it a chance.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">There is one aspect we can only express as a gut feeling (the future will tell). At the moment OpenMetadata is a great OSS product, but looking at the roadmaps it seems that in the future the managed service will be promoted more than the OSS project. However, we do not want to put too much weight on this and hope that our impression is deceptive. <\/span><span style=\"font-weight: 400;\">DataHub on the other hand is really community driven (so far). We don&#8217;t have the feeling that many useful features will only be available in the managed version.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In summary, we are really glad that there are two such great data catalog tools available as open source. Both are solid choices for any organization looking to improve its data management. The final choice depends on your preferences and your individual use case and requirements.<\/span><\/p>\n<p>If you would like to discuss data management and our experience with data catalogs, or if you would like a small workshop, please contact us.<\/p>\n<p><span style=\"font-weight: 400;\">How do you like this comparison? Do you disagree at any point? Let us know in the comments.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Data observability is key to today&#8217;s business world when it comes to digitizing and automating processes and being a data-driven company. Data catalogs are the foundation when focusing on establishing and improving data observability in a company. In the following, we will compare three data catalog tools that are available as open source extensively. Thereby, [&hellip;]<\/p>\n","protected":false},"author":231,"featured_media":48546,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"inline_featured_image":false,"ep_exclude_from_search":false,"footnotes":""},"tags":[385,909],"service":[411],"coauthors":[{"id":231,"display_name":"David Schmidt","user_nicename":"dschmidt"},{"id":297,"display_name":"Tim Bossenmaier","user_nicename":"tbossenmaier"}],"class_list":["post-46916","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","tag-data-engineering","tag-open-source","service-data-engineering"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.8 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Data Observability is Key: A Hands-on Comparison of Open Source Data Catalog Tools - inovex GmbH<\/title>\n<meta name=\"description\" content=\"Data observability is key to today&#039;s business world, this blogpost evaluates data catalog tools and shows how you can make most of your data.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools\/\" \/>\n<meta property=\"og:locale\" content=\"de_DE\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Data Observability is Key: A Hands-on Comparison of Open Source Data Catalog Tools - inovex GmbH\" \/>\n<meta property=\"og:description\" content=\"Data observability is key to today&#039;s business world, this blogpost evaluates data catalog tools and shows how you can make most of your data.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.inovex.de\/de\/blog\/data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools\/\" \/>\n<meta property=\"og:site_name\" content=\"inovex GmbH\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/inovexde\" \/>\n<meta property=\"article:published_time\" content=\"2023-09-15T09:27:33+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-06-23T05:48:17+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.inovex.de\/wp-content\/uploads\/Data-Observability-1.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1500\" \/>\n\t<meta property=\"og:image:height\" content=\"880\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"David Schmidt, Tim Bossenmaier\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/www.inovex.de\/wp-content\/uploads\/Data-Observability-1-1024x601.png\" \/>\n<meta name=\"twitter:creator\" content=\"@inovexgmbh\" \/>\n<meta name=\"twitter:site\" content=\"@inovexgmbh\" \/>\n<meta name=\"twitter:label1\" content=\"Verfasst von\" \/>\n\t<meta name=\"twitter:data1\" content=\"David Schmidt\" \/>\n\t<meta name=\"twitter:label2\" content=\"Gesch\u00e4tzte Lesezeit\" \/>\n\t<meta name=\"twitter:data2\" content=\"22\u00a0Minuten\" \/>\n\t<meta name=\"twitter:label3\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data3\" content=\"David Schmidt, Tim Bossenmaier\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools\\\/\"},\"author\":{\"name\":\"David Schmidt\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/person\\\/79d9e9d1d9797670905cf9e0844cae57\"},\"headline\":\"Data Observability is Key: A Hands-on Comparison of Open Source Data Catalog Tools\",\"datePublished\":\"2023-09-15T09:27:33+00:00\",\"dateModified\":\"2026-06-23T05:48:17+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools\\\/\"},\"wordCount\":4432,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/Data-Observability-1.png\",\"keywords\":[\"Data Engineering\",\"Open Source\"],\"articleSection\":[\"General\"],\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools\\\/\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools\\\/\",\"name\":\"Data Observability is Key: A Hands-on Comparison of Open Source Data Catalog Tools - inovex GmbH\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/Data-Observability-1.png\",\"datePublished\":\"2023-09-15T09:27:33+00:00\",\"dateModified\":\"2026-06-23T05:48:17+00:00\",\"description\":\"Data observability is key to today's business world, this blogpost evaluates data catalog tools and shows how you can make most of your data.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools\\\/#breadcrumb\"},\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/Data-Observability-1.png\",\"contentUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/Data-Observability-1.png\",\"width\":1500,\"height\":880,\"caption\":\"Grafik: Mann steht vor einem visuellen B\u00fccherregal mit Datenbanken\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Data Observability is Key: A Hands-on Comparison of Open Source Data Catalog Tools\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#website\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\",\"name\":\"inovex GmbH\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"de\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\",\"name\":\"inovex GmbH\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/inovex-logo-16-9-1.png\",\"contentUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/inovex-logo-16-9-1.png\",\"width\":1921,\"height\":1081,\"caption\":\"inovex GmbH\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/inovexde\",\"https:\\\/\\\/x.com\\\/inovexgmbh\",\"https:\\\/\\\/www.instagram.com\\\/inovexlife\\\/\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/inovex\",\"https:\\\/\\\/www.youtube.com\\\/channel\\\/UC7r66GT14hROB_RQsQBAQUQ\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/person\\\/79d9e9d1d9797670905cf9e0844cae57\",\"name\":\"David Schmidt\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/Profilbild-2023-96x96.jpg4abd07b0a406a2db61f7c03f3ae19d0b\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/Profilbild-2023-96x96.jpg\",\"contentUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/Profilbild-2023-96x96.jpg\",\"caption\":\"David Schmidt\"},\"sameAs\":[\"https:\\\/\\\/www.linkedin.com\\\/in\\\/david-schmidt-de\\\/\"],\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/author\\\/dschmidt\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Data Observability is Key: A Hands-on Comparison of Open Source Data Catalog Tools - inovex GmbH","description":"Data observability is key to today's business world, this blogpost evaluates data catalog tools and shows how you can make most of your data.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.inovex.de\/de\/blog\/data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools\/","og_locale":"de_DE","og_type":"article","og_title":"Data Observability is Key: A Hands-on Comparison of Open Source Data Catalog Tools - inovex GmbH","og_description":"Data observability is key to today's business world, this blogpost evaluates data catalog tools and shows how you can make most of your data.","og_url":"https:\/\/www.inovex.de\/de\/blog\/data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools\/","og_site_name":"inovex GmbH","article_publisher":"https:\/\/www.facebook.com\/inovexde","article_published_time":"2023-09-15T09:27:33+00:00","article_modified_time":"2026-06-23T05:48:17+00:00","og_image":[{"width":1500,"height":880,"url":"https:\/\/www.inovex.de\/wp-content\/uploads\/Data-Observability-1.png","type":"image\/png"}],"author":"David Schmidt, Tim Bossenmaier","twitter_card":"summary_large_image","twitter_image":"https:\/\/www.inovex.de\/wp-content\/uploads\/Data-Observability-1-1024x601.png","twitter_creator":"@inovexgmbh","twitter_site":"@inovexgmbh","twitter_misc":{"Verfasst von":"David Schmidt","Gesch\u00e4tzte Lesezeit":"22\u00a0Minuten","Written by":"David Schmidt, Tim Bossenmaier"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.inovex.de\/de\/blog\/data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools\/#article","isPartOf":{"@id":"https:\/\/www.inovex.de\/de\/blog\/data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools\/"},"author":{"name":"David Schmidt","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/79d9e9d1d9797670905cf9e0844cae57"},"headline":"Data Observability is Key: A Hands-on Comparison of Open Source Data Catalog Tools","datePublished":"2023-09-15T09:27:33+00:00","dateModified":"2026-06-23T05:48:17+00:00","mainEntityOfPage":{"@id":"https:\/\/www.inovex.de\/de\/blog\/data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools\/"},"wordCount":4432,"commentCount":0,"publisher":{"@id":"https:\/\/www.inovex.de\/de\/#organization"},"image":{"@id":"https:\/\/www.inovex.de\/de\/blog\/data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools\/#primaryimage"},"thumbnailUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/Data-Observability-1.png","keywords":["Data Engineering","Open Source"],"articleSection":["General"],"inLanguage":"de","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.inovex.de\/de\/blog\/data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.inovex.de\/de\/blog\/data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools\/","url":"https:\/\/www.inovex.de\/de\/blog\/data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools\/","name":"Data Observability is Key: A Hands-on Comparison of Open Source Data Catalog Tools - inovex GmbH","isPartOf":{"@id":"https:\/\/www.inovex.de\/de\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.inovex.de\/de\/blog\/data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools\/#primaryimage"},"image":{"@id":"https:\/\/www.inovex.de\/de\/blog\/data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools\/#primaryimage"},"thumbnailUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/Data-Observability-1.png","datePublished":"2023-09-15T09:27:33+00:00","dateModified":"2026-06-23T05:48:17+00:00","description":"Data observability is key to today's business world, this blogpost evaluates data catalog tools and shows how you can make most of your data.","breadcrumb":{"@id":"https:\/\/www.inovex.de\/de\/blog\/data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools\/#breadcrumb"},"inLanguage":"de","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.inovex.de\/de\/blog\/data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools\/"]}]},{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/blog\/data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools\/#primaryimage","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/Data-Observability-1.png","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/Data-Observability-1.png","width":1500,"height":880,"caption":"Grafik: Mann steht vor einem visuellen B\u00fccherregal mit Datenbanken"},{"@type":"BreadcrumbList","@id":"https:\/\/www.inovex.de\/de\/blog\/data-observability-is-key-a-hands-on-comparison-of-open-source-data-catalog-tools\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.inovex.de\/de\/"},{"@type":"ListItem","position":2,"name":"Data Observability is Key: A Hands-on Comparison of Open Source Data Catalog Tools"}]},{"@type":"WebSite","@id":"https:\/\/www.inovex.de\/de\/#website","url":"https:\/\/www.inovex.de\/de\/","name":"inovex GmbH","description":"","publisher":{"@id":"https:\/\/www.inovex.de\/de\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.inovex.de\/de\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"de"},{"@type":"Organization","@id":"https:\/\/www.inovex.de\/de\/#organization","name":"inovex GmbH","url":"https:\/\/www.inovex.de\/de\/","logo":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png","width":1921,"height":1081,"caption":"inovex GmbH"},"image":{"@id":"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/inovexde","https:\/\/x.com\/inovexgmbh","https:\/\/www.instagram.com\/inovexlife\/","https:\/\/www.linkedin.com\/company\/inovex","https:\/\/www.youtube.com\/channel\/UC7r66GT14hROB_RQsQBAQUQ"]},{"@type":"Person","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/79d9e9d1d9797670905cf9e0844cae57","name":"David Schmidt","image":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/wp-content\/uploads\/Profilbild-2023-96x96.jpg4abd07b0a406a2db61f7c03f3ae19d0b","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/Profilbild-2023-96x96.jpg","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/Profilbild-2023-96x96.jpg","caption":"David Schmidt"},"sameAs":["https:\/\/www.linkedin.com\/in\/david-schmidt-de\/"],"url":"https:\/\/www.inovex.de\/de\/blog\/author\/dschmidt\/"}]}},"_links":{"self":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/46916","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/users\/231"}],"replies":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/comments?post=46916"}],"version-history":[{"count":9,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/46916\/revisions"}],"predecessor-version":[{"id":68084,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/46916\/revisions\/68084"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/media\/48546"}],"wp:attachment":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/media?parent=46916"}],"wp:term":[{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/tags?post=46916"},{"taxonomy":"service","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/service?post=46916"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/coauthors?post=46916"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}