{"id":41261,"date":"2023-06-06T15:26:05","date_gmt":"2023-06-06T13:26:05","guid":{"rendered":"https:\/\/www.inovex.de\/?p=41261"},"modified":"2023-06-06T15:26:05","modified_gmt":"2023-06-06T13:26:05","slug":"data-orchestration-is-airflow-still-the-best-part-3","status":"publish","type":"post","link":"https:\/\/www.inovex.de\/de\/blog\/data-orchestration-is-airflow-still-the-best-part-3\/","title":{"rendered":"Data Orchestration: Is Airflow Still the Best? (Part 3)"},"content":{"rendered":"<p>Welcome again, to part 3 of this article series about data orchestration. In this part, we want to implement our beloved pipeline from <a href=\"https:\/\/www.inovex.de\/de\/blog\/data-orchestration-is-airflow-still-the-best-part-1\/\">part 1<\/a> once again, but this time in Dagster. In <a href=\"https:\/\/www.inovex.de\/de\/blog\/data-orchestration-is-airflow-still-the-best-part-2\/\">part 2<\/a> we implemented this pipeline in Prefect and could see that although Prefect has some differences from Airflow, the task implementation was quite similar. Dagster has a completely new approach to data orchestration and so we will learn a lot of new concepts. So tune in!<!--more--><\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_85 counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\"><p class=\"ez-toc-title\" style=\"cursor:inherit\"><\/p>\n<\/div><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-orchestration-is-airflow-still-the-best-part-3\/#Dagster\" >Dagster<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-orchestration-is-airflow-still-the-best-part-3\/#Prerequisites\" >Prerequisites<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-orchestration-is-airflow-still-the-best-part-3\/#Resources\" >Resources<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-orchestration-is-airflow-still-the-best-part-3\/#Asset-postgres-ingestion\" >Asset: postgres ingestion<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-orchestration-is-airflow-still-the-best-part-3\/#IO-Manager-LocalPostgresIOManager\" >IO Manager: LocalPostgresIOManager<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-orchestration-is-airflow-still-the-best-part-3\/#Repository\" >Repository<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-orchestration-is-airflow-still-the-best-part-3\/#Asset-revenue-per-day-per-manager-plot\" >Asset: revenue per day per manager plot<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-orchestration-is-airflow-still-the-best-part-3\/#Asset-average-revenue-per-manager-aggregation\" >Asset: average revenue per manager aggregation<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-orchestration-is-airflow-still-the-best-part-3\/#Asset-average-revenue-per-manager-plot\" >Asset: average revenue per manager plot<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-orchestration-is-airflow-still-the-best-part-3\/#Dagsters-web-UI-pipeline-run\" >Dagster&#8217;s web UI &amp; pipeline run<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-11\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-orchestration-is-airflow-still-the-best-part-3\/#Final-remarks\" >Final remarks<\/a><\/li><\/ul><\/li><\/ul><\/nav><\/div>\n<h2><span class=\"ez-toc-section\" id=\"Dagster\"><\/span>Dagster<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><a href=\"https:\/\/www.elementl.com\/\" target=\"_blank\" rel=\"noopener\">Elementl<\/a>, the company which invented <a href=\"https:\/\/dagster.io\/\" target=\"_blank\" rel=\"noopener\">Dagster<\/a>, was also founded in 2018 and has its headquarters in the bay area of San Fransisco. What does <a href=\"https:\/\/dagster.io\/\" target=\"_blank\" rel=\"noopener\">Dagster<\/a> say about itself?<\/p>\n<blockquote><p>\u201cThe cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability.\u201c<\/p><\/blockquote>\n<p>Wow, what a statement. Let&#8217;s see whether Dagster can justify this statement or if this is some heavy marketing right there!<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Prerequisites\"><\/span>Prerequisites<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Before diving into Dagster, we have to install dagster and a few dependencies. Open up a terminal and run:<\/p>\n<pre class=\"lang:zsh decode:true\">pip install dagster dagit\r\npip install plotly pandas pytest python-dotenv psycopg2<\/pre>\n<p>Dagster is as simple as installing Prefect. By the way, I use version 1.1.5 for Dagster and we will only focus on the open-source version of Dagster in this experiment.<\/p>\n<p>What I also like about Dagster is how to bootstrap a project structure. To get a default project skeleton, we just have to run<\/p>\n<pre class=\"lang:zsh decode:true\">dagster project scaffold --name franchise-blog<\/pre>\n<p>Your project structure should look similar to what is illustrated on figure 1.<\/p>\n<figure style=\"width: 259px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/blogpost_dataorchestration_dagster_projstruct_11.png\" alt=\"Dagster Project Structure\" width=\"259\" height=\"339\" \/><figcaption class=\"wp-caption-text\"><strong>Figure 1: Dagster&#8217;s project structure<\/strong><\/figcaption><\/figure>\n<p>We can ignore most files for now. What is interesting though, is that the project structure resembles a setup that you usually encounter when creating your own Python libraries. The reason is that this is actually\u00a0a fully functioning Python package! You can try this out by completing the\u00a0<em>setup.py<\/em> file:<\/p>\n<pre class=\"lang:python decode:true \">from setuptools import find_packages, setup\r\n\r\nsetup(\r\n    name=\"franchise_blog\",\r\n    packages=find_packages(exclude=[\"franchise_blog_tests\"]),\r\n    install_requires=[\r\n        \"dagster\",\r\n        \"pandas\",\r\n        \"plotly\",\r\n        \"psycopg2\",\r\n        \"python-dotenv\"\r\n    ],\r\n    extras_require={\"dev\": [\"dagit\", \"pytest\"]},\r\n)<\/pre>\n<p>Afterward, you can install the package via pip:<\/p>\n<pre class=\"lang:zsh decode:true\">pip install -e \".[dev]\"<\/pre>\n<p>The flag\u00a0<em>-e<\/em> tells pip to automatically apply local code changes.<\/p>\n<p>This approach is quite favorable since every pipeline can be put in isolated packages. Prefect isolated the pipeline code by using Deployments. So both tools make it easy for us to do local development. Anyway, let&#8217;s start implementing our pipeline in Dagster!<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Resources\"><\/span>Resources<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>The first concept where we want to dig in is <em>resources<\/em>. Contrary to Prefect and Airflow, Dagster follows an asset-centric paradigm. Thus, Dagster focuses on files, tables, machine learning models, and so forth. These are our assets. This shift in paradigm is very interesting since very often, we deal with technicalities when using data orchestration tools but actually, we should focus more on our assets. We will see at a later point how Dagster handles these technicalities.<\/p>\n<p>First of all, we need to define two resources: a path to our base directory where we will store our assets, and a Postgres API resource. Thus, we create a <em>resource.py\u00a0<\/em>file in the\u00a0<em>franchise_blog<\/em> directory and insert the following code:<\/p>\n<pre class=\"lang:default decode:true\">import psycopg2\r\nfrom dagster import resource, InitResourceContext\r\n  \r\n@resource(config_schema={\"host\": str, \"port\": str, \"database\": str, \"user\": str, \"password\": str})\r\ndef postgres_api(init_context: InitResourceContext):\r\n\u00a0 \u00a0 database_connection = {\r\n\u00a0 \u00a0 \u00a0 \u00a0 'host': init_context.resource_config[\"host\"],\r\n\u00a0 \u00a0 \u00a0 \u00a0 'port': init_context.resource_config[\"port\"],\r\n\u00a0 \u00a0 \u00a0 \u00a0 'database': init_context.resource_config[\"database\"],\r\n\u00a0 \u00a0 \u00a0 \u00a0 'user': init_context.resource_config[\"user\"],\r\n\u00a0 \u00a0 \u00a0 \u00a0 'password': init_context.resource_config[\"password\"]\r\n\u00a0 \u00a0 }\r\n\r\n\u00a0 \u00a0 return psycopg2.connect(**database_connection)\r\n\r\n@resource(config_schema={\"base_dir\": str})\r\ndef base_dir(init_context: InitResourceContext):\r\n\u00a0 \u00a0 return init_context.resource_config[\"base_dir\"]<\/pre>\n<p>We can declare a resource with the <em>resource\u00a0<\/em>decorator and define a configuration schema. This schema defines the shape of the resource. Also, a resource accepts a context object as an argument. We do not have to worry about this at the moment. Just know, that these context objects are enriched with metadata and encapsulate important functionalities for configuration. Nonetheless, our Postgres resource is quite simple, we simply fetch the database details and return a connection object. Note that the <strong>return value<\/strong> actually represents our resource. We should not forget about our second resource, so we also have to set up the base directory resource.<\/p>\n<p>So our resources are ready to go but we still have to configure them. For that purpose, create a\u00a0<em>configurations.py<\/em> file inside of the\u00a0<em>franchise_blog<\/em> directory and Copy &amp; Paste the following code:<\/p>\n<pre class=\"lang:python decode:true \">import os\r\nfrom dagster import ResourceDefinition\r\nfrom dotenv import load_dotenv\r\nfrom franchise_blog.resources import postgres_api, base_dir\r\n  \r\nload_dotenv()\r\n  \r\ndef get_configured_postgres_api() -&gt; ResourceDefinition:\r\n\u00a0 \u00a0 return postgres_api.configured(\r\n\u00a0 \u00a0 \u00a0 \u00a0 {\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'host': os.environ[\"POSTGRES_HOST\"],\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'port': os.environ[\"POSTGRES_PORT\"],\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'database': os.environ[\"POSTGRES_DB\"],\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'user': os.environ[\"POSTGRES_USER\"],\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'password': os.environ[\"POSTGRES_PW\"]\r\n\u00a0 \u00a0 \u00a0 \u00a0 }\r\n\u00a0 \u00a0 )\r\n\r\n  \r\ndef get_configured_base_dir() -&gt; ResourceDefinition:\r\n\u00a0 \u00a0 return base_dir.configured({'base_dir': os.environ[\"BASE_DIR\"]})<\/pre>\n<p>Essentially, we are storing our resource-related data in an environment file. Therefore, we have to load our environment file and configure our resources. So basically what we have done is separating the resource configuration from the resource declaration. Only when we configure our resources, do we obtain a so-called <a href=\"https:\/\/docs.dagster.io\/concepts\/resources\" target=\"_blank\" rel=\"noopener\"><em>ResourceDefinition<\/em><\/a>. Configuring the resources is simple, we just have to use the\u00a0<em>configured<\/em> method. Do not forget to create an appropriate\u00a0<em>.env<\/em> file in the root of the working space which should contain the following content:<\/p>\n<pre class=\"lang:default decode:true \">POSTGRES_HOST=&lt;postgres host, e.g. localhost&gt;\r\nPOSTGRES_PORT=&lt;postgres port, e.g. 5432&gt;\r\nPOSTGRES_DB=&lt;postgres database&gt;\r\nPOSTGRES_USER=&lt;postgres user&gt;\r\nPOSTGRES_PW=&lt;postgres pw&gt;\r\n  \r\nBASE_DIR=.\/data #you can also choose another directory if you want<\/pre>\n<p>You might raise the suspicion that we have to write too much boilerplate code in Dagster and that we didn&#8217;t even start writing out our tasks yet! Well, this is the price to pay for the separation of concerns. Furthermore, we want to manage our resources and assets appropriately and this will pay off in the end since we only have to define them once! Let&#8217;s define our first asset!<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Asset-postgres-ingestion\"><\/span>Asset: postgres ingestion<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>We will once again create 3 directories:\u00a0<em>ingestion<\/em>,\u00a0<em>plotting,<\/em> and\u00a0<em>transformation<\/em>. Create those inside the assets directory. Afterward, we create our first asset file <em>postgres.py<\/em> in the\u00a0<em>ingestion<\/em> directory. Look at the following code which defines our asset &#8211; do you recognize any differences in comparison to Prefect and Airflow?<\/p>\n<pre class=\"lang:python decode:true \">from dagster import asset\r\n\r\n@asset(required_resource_keys={\"postgres_api\"}, group_name=\"franchise\", io_manager_key=\"local_postgres_io_manager\")\r\ndef ingest_store_data_from_psql(context):\r\n\u00a0 \u00a0 sql_statement = \"\"\"\r\n\u00a0 \u00a0 \u00a0 \u00a0 select id, manager, city, street, street_number, revenue, day from stores;\r\n\u00a0 \u00a0 \"\"\"\r\n\u00a0 \u00a0 with context.resources.postgres_api.cursor() as cursor:\r\n\u00a0 \u00a0 \u00a0 \u00a0 cursor.execute(sql_statement)\r\n\u00a0 \u00a0 \u00a0 \u00a0 result = cursor.fetchall()\r\n  \r\n\u00a0 \u00a0 return {'result': result}<\/pre>\n<p>Well, of course, we use the\u00a0<em>asset<\/em> decorator instead of the\u00a0<em>task<\/em> decorator. The business logic looks also very similar &#8211; but wait, where is the logic that stores our raw data into a CSV file? We will explore this in a minute. But this is the beauty behind Dagster. Our assets focus only on the business logic and don&#8217;t care where our data is going or what happens to our data afterward. It is much cleaner this way &#8211; but as we will see, we have to pay a price, again! When we take a closer look at our <em>asset<\/em> decorator, we can see that we pass 3 arguments to it:<\/p>\n<ul>\n<li>required_resource_keys: A set of resource references that are required by the <a href=\"https:\/\/docs.dagster.io\/concepts\/ops-jobs-graphs\/ops\" target=\"_blank\" rel=\"noopener\">op<\/a>. You have to know at this point that an asset constitutes of the following 3 parts: An asset key, a function that computes the content of the asset, and a set of upstream assets that are provided as inputs. The function which computes the asset is basically an op which is the core unit of computation in Dagster.<\/li>\n<li>group_name: Simply a string that groups assets that are semantically related.<\/li>\n<li>io_manager_key: An IO manager reference that the asset should use. We will learn in the next section what an IO manager does.<\/li>\n<\/ul>\n<p>The concept of resource keys and IO manager keys is actually quite useful since this utilizes dependency injection. Thus, we can easily switch out resources and IO manager if we want to use a different one. We will also see that this will be quite handy in testing. Moreover, do not confuse Dagster&#8217;s assets with the notion of tasks, the asset in the ingestion step is actually the csv file that contains the\u00a0<em>result.<\/em> That asset is what we care about.<\/p>\n<p>Next, we will investigate the concept of IO Manager. The IO manager is the reason why we can separate the business logic from the rather task-oriented logic. So let&#8217;s define our own custom IO manager!<\/p>\n<h3><span class=\"ez-toc-section\" id=\"IO-Manager-LocalPostgresIOManager\"><\/span>IO Manager: LocalPostgresIOManager<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Let&#8217;s create another directory inside of our asset directory called <em>resources<\/em> and create a Python file named\u00a0<em>local_postgres_io_manager.py.\u00a0<\/em>Copy &amp; paste the following lines of code into it:<\/p>\n<pre class=\"lang:python decode:true \">import os\r\nimport csv\r\nfrom dagster import (\r\n\u00a0 \u00a0 IOManager,\r\n\u00a0 \u00a0 OutputContext,\r\n\u00a0 \u00a0 InputContext, \u00a0\r\n\u00a0 \u00a0 InitResourceContext,\r\n\u00a0 \u00a0 build_init_resource_context,\r\n\u00a0 \u00a0 io_manager\r\n)\r\n  \r\nclass LocalPostgresIOManager(IOManager):\r\n\u00a0 \u00a0 def __init__(self, target_file: str) -&gt; None:\r\n\u00a0 \u00a0 \u00a0 \u00a0 self.target_file = target_file\r\n  \r\n\u00a0 \u00a0 def handle_output(self, context: OutputContext, obj: dict) -&gt; None:\r\n\u00a0 \u00a0 \u00a0 \u00a0 os.makedirs(os.path.dirname(self.target_file), exist_ok=True)\r\n  \r\n\u00a0 \u00a0 \u00a0 \u00a0 result = obj[\"result\"]\r\n\u00a0 \u00a0 \u00a0 \u00a0 with open(self.target_file, 'w', newline='') as file:\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 target_csv = csv.writer(file)\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 target_csv.writerow(['id', 'manager', 'city', 'street', 'street_number', 'revenue', 'day'])\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 for row in result:\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 target_csv.writerow(row)\r\n  \r\n\u00a0 \u00a0 def load_input(self, context: InputContext) -&gt; str:\r\n\u00a0 \u00a0 \u00a0 \u00a0 return self.target_file\r\n  \r\n@io_manager(required_resource_keys={\"base_dir\"})\r\ndef local_postgres_io_manager(init_context: InitResourceContext) -&gt; LocalPostgresIOManager:\r\n\u00a0 \u00a0 target_file = os.path.join(init_context.resources.base_dir, \"storage\/stores.csv\")\r\n\u00a0 \u00a0 return LocalPostgresIOManager(target_file=target_file)\r\n  \r\n@io_manager(required_resource_keys={\"base_dir\"})\r\ndef postgres_io_manager(init_context: InitResourceContext) -&gt; LocalPostgresIOManager:\r\n\u00a0 \u00a0 return local_postgres_io_manager(\r\n\u00a0 \u00a0 \u00a0 \u00a0 build_init_resource_context(\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 resources={\"base_dir\": init_context.resources.base_dir},\r\n\u00a0 \u00a0 \u00a0 \u00a0 )\r\n\u00a0 \u00a0 )<\/pre>\n<p>Wow, this looks overwhelming but don&#8217;t panic, it looks actually worse than it is. But this is the price we have to pay! We outsourced our logic of writing raw data to a CSV file to a dedicated <a href=\"https:\/\/docs.dagster.io\/concepts\/io-management\/io-managers\" target=\"_blank\" rel=\"noopener\">IO Manager<\/a>. An IO Manager in Dagster has the primary task to manage outputs of assets between multiple in-going and out-going assets. In our case, the IO Manager should just manage the output of our Postgres asset.<\/p>\n<p>In order for our <em>LocalPostgresIOManager<\/em> to work, we only have to implement two abstract base methods: <em>handle_output<\/em> and\u00a0<em>load_input<\/em>.<\/p>\n<ul>\n<li>handle_output: Receives the output of our upstream asset and is responsible for handling our output appropriately, e.g. storing it on a local file system. In our case, we are storing the data that we fetched from our Postgres database in our local file system.<\/li>\n<li>load_input: Responsible for loading the correct object into the downstream asset. In our example, it is just passing the path to the CSV file to the next downstream asset.<\/li>\n<\/ul>\n<p>This approach increases code complexity but simplifies our assets and decouples dependencies. We are decoupling our asset, the raw data which we query from the Postgres database, from the storage. For instance, imagine we don&#8217;t want to store our data on a local file system anymore but on S3, then we could simply switch out our IO manager and our pipeline would be still good to go.<\/p>\n<p>Note, that the output of <em>ingest_store_data_from_psql<\/em> is a dictionary with one key named\u00a0<em>result<\/em> which holds the raw data as a value. The argument <em>obj<\/em> in the\u00a0<em>handle_output<\/em> method will actually hold a reference to this dictionary.<\/p>\n<p>Furthermore, in order to use our IO manager, we have to construct an IO manager definition that returns an instance of our IO manager class. Therefore, we define the function <em>postgres_io_manager<\/em> which will build a context for us. This context object holds a resource, namely the\u00a0<em>base_dir\u00a0<\/em>resource. This will be given as an argument to the function\u00a0<em>local_postgres_io_manager.\u00a0<\/em>If you do not need to parameterize your IO manager, then you will not need to implement the method\u00a0<em>postgres_io_manager<\/em> since we can directly initialize the IO manager in\u00a0<em>local_postgres_io_manager<\/em> without further ado.<\/p>\n<p>If this is still too complicated for you, you can leave the IO manager out and code the logic completely into your assets. This will also work fine but the separation will be gone.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Repository\"><\/span>Repository<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Before coding the other assets, let&#8217;s have a look at how we wire up all of our resources, IO managers, and assets. Therefore, create a file <em>repository.py<\/em> inside the\u00a0<em>franchise_blog<\/em> directory. The skeleton for our repository should look like this:<\/p>\n<pre class=\"lang:python decode:true \">from dagster import (\r\n\tload_assets_from_package_module,\r\n\trepository\r\n)\r\n\r\n@repository\r\ndef franchise_blog():\r\n\treturn [\r\n\t\tload_assets_from_package_module(assets)\r\n\t]<\/pre>\n<p>The function\u00a0<em>load_assets_from_package_module<\/em> will load all assets which are inside the <em>assets<\/em> directory into this repository. This is how Dagster can determine which assets it has to include in our pipeline. We will see later how we can manage the dependencies between the assets. Moreover, we can already add our scheduling information by modifying our code:<\/p>\n<pre class=\"lang:python decode:true \">from dagster import (\r\n\tload_assets_from_package_module,\r\n\trepository,\r\n\tdefine_asset_job,\r\n\tScheduleDefinition\t\r\n)\r\nfrom franchise_blog import assets\r\n\r\ndaily_job = define_asset_job(name=\"daily_franchise_update\", selection=\"*\")\r\ndaily_schedule = ScheduleDefinition(\r\n\tjob=daily_job,\r\n\tcron_schedule=\"0 7 * * *\"\r\n)\r\n\r\n@repository\r\ndef franchise_blog():\r\n\treturn [\r\n\t\tdaily_job,\r\n\t\tdaily_schedule,\r\n\t\tload_assets_from_package_module(assets)\r\n\t]\r\n<\/pre>\n<p>Next, we should define our resources and IO managers in our repository because our asset\u00a0<em>ingest_store_data_from_psql\u00a0<\/em>requires the IO manager key\u00a0<em>local_postgres_io_manager.<\/em> This can be adjusted by a slight change in our code:<\/p>\n<pre class=\"lang:python decode:true \">from dagster import (\r\n\tload_assets_from_package_module,\r\n\trepository,\r\n\tdefine_asset_job,\r\n\twith_resources,\r\n\tScheduleDefinition\t\r\n)\r\nfrom franchise_blog.configurations import get_configured_base_dir, get_configured_postgres_api\r\nfrom franchise_blog.assets.resources.local_postgres_io_manager import postgres_io_manager\r\n\r\n...\r\n\r\n@repository\r\ndef franchise_blog():\r\n\treturn [\r\n\t\tdaily_job,\r\n\t\tdaily_schedule,\r\n\t\twith_resources(\r\n\t\t\tload_assets_from_package_module(assets),\r\n\t\t\t{\r\n\t\t\t\t\"postgres_api\": get_configured_postgres_api(),\r\n\t\t\t\t\"base_dir\": get_configured_base_dir(),\r\n\t\t\t\t\"local_postgres_io_manager\": postgres_io_manager\r\n\t\t\t}\r\n\t\t)\r\n\t]<\/pre>\n<p>Dagster really has a rather steep learning curve but once you are familiar with how it works, it really ramps up the experience of creating data pipelines. In the beginning, it can be quite difficult to understand what a repository represents. Think about it in the following way: A repository is a collection of assets, jobs, and whatever we use in our package. This collection represents a unit that is later used by dagit, Dagster&#8217;s CLI, or the dagster-daemon.<\/p>\n<p>Note, that the concept of repositories in Dagster is largely deprecated since version 1.1.6. When you are using version 1.1.6 or higher, you should use Dagster&#8217;s concept of Definitions.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Asset-revenue-per-day-per-manager-plot\"><\/span>Asset: revenue per day per manager plot<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Once again, create the file\u00a0<em>series.py<\/em> inside the <em>plotting<\/em> directory with the following content:<\/p>\n<pre class=\"lang:python decode:true\">import os\r\nimport pandas as pd\r\nimport plotly.express as px\r\nfrom dagster import asset\r\n  \r\n@asset(required_resource_keys={\"base_dir\"}, group_name=\"franchise\")\r\ndef plot_revenue_per_day_per_manager(context, ingest_store_data_from_psql: str):\r\n\u00a0 \u00a0 base_dir = context.resources.base_dir\r\n\u00a0 \u00a0 target_file = f\"{base_dir}\/plots\/revenue_per_day_per_manager.html\"\r\n\u00a0 \u00a0 os.makedirs(os.path.dirname(target_file), exist_ok=True)\r\n  \r\n\u00a0 \u00a0 df \u00a0= pd.read_csv(ingest_store_data_from_psql)\r\n\u00a0 \u00a0 fig = px.line(df, x='day', y='revenue', color='manager', symbol=\"manager\")\r\n\u00a0 \u00a0 fig.update_layout(\r\n\u00a0 \u00a0 \u00a0 \u00a0 font=dict(\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 size=20\r\n\u00a0 \u00a0 \u00a0 \u00a0 )\r\n\u00a0 \u00a0 )\r\n\u00a0 \u00a0 fig.write_html(target_file)<\/pre>\n<p>Note, that we pass the argument\u00a0<em>ingest_store_data_from_psql\u00a0<\/em>to our asset\u00a0<em>plot_revenue_per_day_per_manager<\/em>. This is how we define dependencies in Dagster! Furthermore, since we associated an IO manager to our asset\u00a0<em>ingest_store_data_from_psql<\/em>, the IO manager&#8217;s\u00a0<em>load_input\u00a0<\/em>function will inject the value to our plot asset.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Asset-average-revenue-per-manager-aggregation\"><\/span>Asset: average revenue per manager aggregation<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Next, we create the file\u00a0<em>aggregation.py\u00a0<\/em>inside the <em>transformation<\/em> directory with the following lines of code:<\/p>\n<pre class=\"lang:python decode:true \">import os\r\nimport pandas as pd\r\nfrom dagster import asset\r\n  \r\n@asset(group_name=\"franchise\", io_manager_key=\"local_transformation_io_manager\")\r\ndef aggregate_avg_revenue_per_manager(context, ingest_store_data_from_psql: str):\r\n\u00a0 \u00a0 df = pd.read_csv(ingest_store_data_from_psql)\r\n\u00a0 \u00a0 result = df[[\"manager\", \"city\", \"street\", \"street_number\", \"revenue\"]].groupby([\"manager\" \"city\", \"street\", \"street_number\"]).aggregate('mean')\r\n\u00a0 \u00a0 result[\"average_revenue\"] = result[\"revenue\"]\r\n\u00a0 \u00a0 result = result.drop(columns=[\"revenue\"]).reset_index()\r\n  \r\n\u00a0 \u00a0 return {'result': result}<\/pre>\n<p>Once again, notice how concise our asset is. That is because we can again define a custom IO manager. By the way, you do not have to use a custom IO manager, Dagster offers several. But this way we learn more about them and how they work. As an exercise for the reader, try to implement the IO manager yourself and see how it feels like to handle one.<\/p>\n<p>But do not worry, here is the code for the IO manager, just in case you have trouble implementing it yourself or do not want to do it:<\/p>\n<pre class=\"lang:python decode:true \">import os\r\nfrom dagster import (\r\n\u00a0 \u00a0 IOManager,\r\n\u00a0 \u00a0 OutputContext,\r\n\u00a0 \u00a0 InputContext,\r\n\u00a0 \u00a0 InitResourceContext,\r\n\u00a0 \u00a0 io_manager,\r\n\u00a0 \u00a0 build_init_resource_context\r\n)\r\n  \r\nclass LocalTransformationIOManager(IOManager):\r\n\u00a0 \u00a0 def __init__(self, target_file: str, pickle_file: str) -&gt; None:\r\n\u00a0 \u00a0 \u00a0 \u00a0 self.target_file = target_file\r\n\u00a0 \u00a0 \u00a0 \u00a0 self.pickle_file = pickle_file\r\n  \r\n\u00a0 \u00a0 def handle_output(self, context: OutputContext, obj: dict) -&gt; None:\r\n\u00a0 \u00a0 \u00a0 \u00a0 os.makedirs(os.path.dirname(self.target_file), exist_ok=True)\r\n\u00a0 \u00a0 \u00a0 \u00a0 result = obj['result']\r\n\u00a0 \u00a0 \u00a0 \u00a0 result.to_pickle(self.pickle_file)\r\n\u00a0 \u00a0 \u00a0 \u00a0 result.to_json(self.target_file, orient=\"records\")\r\n  \r\n\u00a0 \u00a0 def load_input(self, context: InputContext) -&gt; str:\r\n\u00a0 \u00a0 \u00a0 \u00a0 return self.pickle_file\r\n\r\n  \r\n@io_manager(required_resource_keys={\"base_dir\"})\r\ndef local_transformation_io_manager(init_context: InitResourceContext) -&gt; LocalTransformationIOManager:\r\n\u00a0 \u00a0 base_dir = init_context.resources.base_dir\r\n\u00a0 \u00a0 target_file = os.path.join(base_dir, \"storage\/agg_avg_revenue_manager.json\")\r\n\u00a0 \u00a0 pickle_file = os.path.join(base_dir, \"storage\/agg_avg_revenue_manager.pkl\")\r\n  \r\n\u00a0 \u00a0 return LocalTransformationIOManager(target_file=target_file, pickle_file=pickle_file)\r\n  \r\n@io_manager(required_resource_keys={\"base_dir\"})\r\ndef transformation_io_manage(init_context: InitResourceContext) -&gt; LocalTransformationIOManager:\r\n\u00a0 \u00a0 return local_transformation_io_manager(\r\n\u00a0 \u00a0 \u00a0 \u00a0 build_init_resource_context(\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 resources={\"base_dir\": init_context.resources.base_dir}\r\n\u00a0 \u00a0 \u00a0 \u00a0 )\r\n\u00a0 \u00a0 )<\/pre>\n<p>The logic is indeed very similar to the IO manager from before. You could go even a step further and try to create a single IO manager which manages both assets! This will require some more work though and more abstraction layers to implement. But for the sake of this experiment, we should not bother with overcomplications. Do not forget to add this IO manager to our repository!<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Asset-average-revenue-per-manager-plot\"><\/span>Asset: average revenue per manager plot<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>As a last step, we have to create the file\u00a0<em>aggregation.py\u00a0<\/em>inside the\u00a0<em>plotting\u00a0<\/em>directory. Copy &amp; Paste the following lines of code:<\/p>\n<pre class=\"lang:python decode:true \">import os\r\nimport pandas as pd\r\nimport plotly.express as px\r\nfrom dagster import asset\r\n  \r\n@asset(required_resource_keys={\"base_dir\"}, group_name=\"franchise\")\r\ndef plot_avg_revenue_per_manager(context, aggregate_avg_revenue_per_manager: str):\r\n\u00a0 \u00a0 base_dir = context.resources.base_dir\r\n\u00a0 \u00a0 target_file = f\"{base_dir}\/plots\/agg_avg_revenue_manager.html\"\r\n\u00a0 \u00a0 os.makedirs(os.path.dirname(target_file), exist_ok=True)\r\n  \r\n\u00a0 \u00a0 df = pd.read_pickle(aggregate_avg_revenue_per_manager)\r\n\u00a0 \u00a0 fig = px.bar(df, x='manager', y='average_revenue',\r\n\u00a0 \u00a0 hover_data = ['city', 'street', 'street_number'],\r\n\u00a0 \u00a0 labels = {'average_revenue': \"Average Revenue by Manager\", 'manager_name': \"Manager\"})\r\n  \r\n\u00a0 \u00a0 fig.update_layout(\r\n\u00a0 \u00a0 \u00a0 \u00a0 font=dict(\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 size=20\r\n\u00a0 \u00a0 \u00a0 \u00a0 )\r\n\u00a0 \u00a0 )\r\n\u00a0 \u00a0 fig.write_html(target_file)<\/pre>\n<p>By now, you should probably understand what we did here. If this doesn&#8217;t seem familiar to you, you should check out parts 1 &amp; 2 of this article series.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Dagsters-web-UI-pipeline-run\"><\/span>Dagster&#8217;s web UI &amp; pipeline run<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>To see our pipeline in the UI, execute the following command in a terminal in the working space directory:<\/p>\n<pre class=\"lang:zsh decode:true \">dagit<\/pre>\n<p>It should display a URL that points to the UI. When you open up your browser and follow the URL, you should see something like this:<\/p>\n<figure style=\"width: 1180px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/blogpost_dataorchestration_dagster_webui_12.png\" alt=\"Dagster's Web UI\" width=\"1180\" height=\"646\" \/><figcaption class=\"wp-caption-text\"><strong>Figure 2: Dagster&#8217;s Web UI<\/strong><\/figcaption><\/figure>\n<p>You should see your pipeline and if you click on &#8222;Materialize all&#8220; in the top right corner, Dagster will instantiate a run and a popup appears where we can click on &#8222;View run&#8220;. But before doing this, look around the UI, there is some useful information like when our pipeline is scheduled and when the last run finished.<\/p>\n<p>When materializing our assets, the execution timeline will look like this:<\/p>\n<figure style=\"width: 1113px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/blogpost_dataorchestration_dagster_timeline_13.png\" alt=\"Dagster's Execution Timeline\" width=\"1113\" height=\"609\" \/><figcaption class=\"wp-caption-text\"><strong>Figure 3: Dagster&#8217;s Execution Timeline<\/strong><\/figcaption><\/figure>\n<p>What I really like about this UI view is that you can follow your pipeline run in real time. You can see when an asset is being processed and which assets are being processed in parallel and how long it takes. Furthermore, you have an event display and if some asset fails, you are also able to partially re-run your pipeline. I invite you to further investigate the UI and especially investigate the tabs &#8222;Assets&#8220; and &#8222;Deployment&#8220;.<\/p>\n<p>Moreover, you can even instantiate a Backfill via the UI, in Airflow you can only do this via the CLI. And if you want to run your pipeline on schedule, don&#8217;t forget to launch a <a href=\"https:\/\/docs.dagster.io\/deployment\/dagster-daemon\" target=\"_blank\" rel=\"noopener\">Dagster daemon<\/a>. Open up a terminal in your working space directory where also your <em>.toml<\/em> is located and run<\/p>\n<pre class=\"lang:zsh decode:true \">dagster-daemon run<\/pre>\n<h3><span class=\"ez-toc-section\" id=\"Final-remarks\"><\/span>Final remarks<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Puhh, congratulations, we did it! We implemented the same pipeline in 3 data orchestration tools: Airflow, Prefect, and Dagster! If you are still motivated, then the next and last part of this article series might interest you: We will see how to do unit testing in each tool, how each tool performs in our rating, and what kind of future trends I see coming up more and more in data orchestration!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Welcome again, to part 3 of this article series about data orchestration. In this part, we want to implement our beloved pipeline from part 1 once again, but this time in Dagster. In part 2 we implemented this pipeline in Prefect and could see that although Prefect has some differences from Airflow, the task implementation [&hellip;]<\/p>\n","protected":false},"author":318,"featured_media":46057,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"inline_featured_image":false,"ep_exclude_from_search":false,"footnotes":""},"tags":[783,77,385,377],"service":[411],"coauthors":[{"id":318,"display_name":"Raphael Skuza","user_nicename":"rskuza"}],"class_list":["post-41261","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","tag-airflow","tag-big-data","tag-data-engineering","tag-development","service-data-engineering"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.8 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Data Orchestration: Is Airflow Still the Best? (Part 3) - inovex GmbH<\/title>\n<meta name=\"description\" content=\"In the 3rd part of this article series, we are having a look at another competitor of Airflow: Dagster! Can Dagster beat Airflow?\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-orchestration-is-airflow-still-the-best-part-3\/\" \/>\n<meta property=\"og:locale\" content=\"de_DE\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Data Orchestration: Is Airflow Still the Best? (Part 3) - inovex GmbH\" \/>\n<meta property=\"og:description\" content=\"In the 3rd part of this article series, we are having a look at another competitor of Airflow: Dagster! Can Dagster beat Airflow?\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.inovex.de\/de\/blog\/data-orchestration-is-airflow-still-the-best-part-3\/\" \/>\n<meta property=\"og:site_name\" content=\"inovex GmbH\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/inovexde\" \/>\n<meta property=\"article:published_time\" content=\"2023-06-06T13:26:05+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.inovex.de\/wp-content\/uploads\/Data-Orchestration_Header_V3-scaled.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"2560\" \/>\n\t<meta property=\"og:image:height\" content=\"1440\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Raphael Skuza\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/www.inovex.de\/wp-content\/uploads\/Data-Orchestration_Header_V3-1024x576.jpg\" \/>\n<meta name=\"twitter:creator\" content=\"@inovexgmbh\" \/>\n<meta name=\"twitter:site\" content=\"@inovexgmbh\" \/>\n<meta name=\"twitter:label1\" content=\"Verfasst von\" \/>\n\t<meta name=\"twitter:data1\" content=\"Raphael Skuza\" \/>\n\t<meta name=\"twitter:label2\" content=\"Gesch\u00e4tzte Lesezeit\" \/>\n\t<meta name=\"twitter:data2\" content=\"18\u00a0Minuten\" \/>\n\t<meta name=\"twitter:label3\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data3\" content=\"Raphael Skuza\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-orchestration-is-airflow-still-the-best-part-3\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-orchestration-is-airflow-still-the-best-part-3\\\/\"},\"author\":{\"name\":\"Raphael Skuza\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/person\\\/126d1a5e3c6b844fa0d4708df41e9cfb\"},\"headline\":\"Data Orchestration: Is Airflow Still the Best? (Part 3)\",\"datePublished\":\"2023-06-06T13:26:05+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-orchestration-is-airflow-still-the-best-part-3\\\/\"},\"wordCount\":2554,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-orchestration-is-airflow-still-the-best-part-3\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/Data-Orchestration_Header_V3-scaled.jpg\",\"keywords\":[\"Airflow\",\"Big Data\",\"Data Engineering\",\"Development\"],\"articleSection\":[\"Analytics\",\"English Content\",\"General\",\"Infrastructure\"],\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-orchestration-is-airflow-still-the-best-part-3\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-orchestration-is-airflow-still-the-best-part-3\\\/\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-orchestration-is-airflow-still-the-best-part-3\\\/\",\"name\":\"Data Orchestration: Is Airflow Still the Best? (Part 3) - inovex GmbH\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-orchestration-is-airflow-still-the-best-part-3\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-orchestration-is-airflow-still-the-best-part-3\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/Data-Orchestration_Header_V3-scaled.jpg\",\"datePublished\":\"2023-06-06T13:26:05+00:00\",\"description\":\"In the 3rd part of this article series, we are having a look at another competitor of Airflow: Dagster! Can Dagster beat Airflow?\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-orchestration-is-airflow-still-the-best-part-3\\\/#breadcrumb\"},\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-orchestration-is-airflow-still-the-best-part-3\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-orchestration-is-airflow-still-the-best-part-3\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/Data-Orchestration_Header_V3-scaled.jpg\",\"contentUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/Data-Orchestration_Header_V3-scaled.jpg\",\"width\":2560,\"height\":1440,\"caption\":\"Zwei Personen stehen nebeneinander und interagieren mit einer grafischen Benutzeroberfl\u00e4che, die verschiedene Datenvisualisierungen und Steuerelemente zeigt. Die Person auf der linken Seite tr\u00e4gt schwarze Kleidung, w\u00e4hrend die Person auf der rechten Seite ein gelbes Oberteil tr\u00e4gt.\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-orchestration-is-airflow-still-the-best-part-3\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Data Orchestration: Is Airflow Still the Best? (Part 3)\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#website\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\",\"name\":\"inovex GmbH\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"de\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\",\"name\":\"inovex GmbH\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/inovex-logo-16-9-1.png\",\"contentUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/inovex-logo-16-9-1.png\",\"width\":1921,\"height\":1081,\"caption\":\"inovex GmbH\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/inovexde\",\"https:\\\/\\\/x.com\\\/inovexgmbh\",\"https:\\\/\\\/www.instagram.com\\\/inovexlife\\\/\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/inovex\",\"https:\\\/\\\/www.youtube.com\\\/channel\\\/UC7r66GT14hROB_RQsQBAQUQ\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/person\\\/126d1a5e3c6b844fa0d4708df41e9cfb\",\"name\":\"Raphael Skuza\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/cropped-Raphael-Skuza-Profilbild-96x96.jpga7fa52311aba9815836e166521178831\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/cropped-Raphael-Skuza-Profilbild-96x96.jpg\",\"contentUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/cropped-Raphael-Skuza-Profilbild-96x96.jpg\",\"caption\":\"Raphael Skuza\"},\"sameAs\":[\"https:\\\/\\\/de.linkedin.com\\\/in\\\/raphael-skuza\"],\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/author\\\/rskuza\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Data Orchestration: Is Airflow Still the Best? (Part 3) - inovex GmbH","description":"In the 3rd part of this article series, we are having a look at another competitor of Airflow: Dagster! Can Dagster beat Airflow?","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.inovex.de\/de\/blog\/data-orchestration-is-airflow-still-the-best-part-3\/","og_locale":"de_DE","og_type":"article","og_title":"Data Orchestration: Is Airflow Still the Best? (Part 3) - inovex GmbH","og_description":"In the 3rd part of this article series, we are having a look at another competitor of Airflow: Dagster! Can Dagster beat Airflow?","og_url":"https:\/\/www.inovex.de\/de\/blog\/data-orchestration-is-airflow-still-the-best-part-3\/","og_site_name":"inovex GmbH","article_publisher":"https:\/\/www.facebook.com\/inovexde","article_published_time":"2023-06-06T13:26:05+00:00","og_image":[{"width":2560,"height":1440,"url":"https:\/\/www.inovex.de\/wp-content\/uploads\/Data-Orchestration_Header_V3-scaled.jpg","type":"image\/jpeg"}],"author":"Raphael Skuza","twitter_card":"summary_large_image","twitter_image":"https:\/\/www.inovex.de\/wp-content\/uploads\/Data-Orchestration_Header_V3-1024x576.jpg","twitter_creator":"@inovexgmbh","twitter_site":"@inovexgmbh","twitter_misc":{"Verfasst von":"Raphael Skuza","Gesch\u00e4tzte Lesezeit":"18\u00a0Minuten","Written by":"Raphael Skuza"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.inovex.de\/de\/blog\/data-orchestration-is-airflow-still-the-best-part-3\/#article","isPartOf":{"@id":"https:\/\/www.inovex.de\/de\/blog\/data-orchestration-is-airflow-still-the-best-part-3\/"},"author":{"name":"Raphael Skuza","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/126d1a5e3c6b844fa0d4708df41e9cfb"},"headline":"Data Orchestration: Is Airflow Still the Best? (Part 3)","datePublished":"2023-06-06T13:26:05+00:00","mainEntityOfPage":{"@id":"https:\/\/www.inovex.de\/de\/blog\/data-orchestration-is-airflow-still-the-best-part-3\/"},"wordCount":2554,"commentCount":0,"publisher":{"@id":"https:\/\/www.inovex.de\/de\/#organization"},"image":{"@id":"https:\/\/www.inovex.de\/de\/blog\/data-orchestration-is-airflow-still-the-best-part-3\/#primaryimage"},"thumbnailUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/Data-Orchestration_Header_V3-scaled.jpg","keywords":["Airflow","Big Data","Data Engineering","Development"],"articleSection":["Analytics","English Content","General","Infrastructure"],"inLanguage":"de","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.inovex.de\/de\/blog\/data-orchestration-is-airflow-still-the-best-part-3\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.inovex.de\/de\/blog\/data-orchestration-is-airflow-still-the-best-part-3\/","url":"https:\/\/www.inovex.de\/de\/blog\/data-orchestration-is-airflow-still-the-best-part-3\/","name":"Data Orchestration: Is Airflow Still the Best? (Part 3) - inovex GmbH","isPartOf":{"@id":"https:\/\/www.inovex.de\/de\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.inovex.de\/de\/blog\/data-orchestration-is-airflow-still-the-best-part-3\/#primaryimage"},"image":{"@id":"https:\/\/www.inovex.de\/de\/blog\/data-orchestration-is-airflow-still-the-best-part-3\/#primaryimage"},"thumbnailUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/Data-Orchestration_Header_V3-scaled.jpg","datePublished":"2023-06-06T13:26:05+00:00","description":"In the 3rd part of this article series, we are having a look at another competitor of Airflow: Dagster! Can Dagster beat Airflow?","breadcrumb":{"@id":"https:\/\/www.inovex.de\/de\/blog\/data-orchestration-is-airflow-still-the-best-part-3\/#breadcrumb"},"inLanguage":"de","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.inovex.de\/de\/blog\/data-orchestration-is-airflow-still-the-best-part-3\/"]}]},{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/blog\/data-orchestration-is-airflow-still-the-best-part-3\/#primaryimage","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/Data-Orchestration_Header_V3-scaled.jpg","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/Data-Orchestration_Header_V3-scaled.jpg","width":2560,"height":1440,"caption":"Zwei Personen stehen nebeneinander und interagieren mit einer grafischen Benutzeroberfl\u00e4che, die verschiedene Datenvisualisierungen und Steuerelemente zeigt. Die Person auf der linken Seite tr\u00e4gt schwarze Kleidung, w\u00e4hrend die Person auf der rechten Seite ein gelbes Oberteil tr\u00e4gt."},{"@type":"BreadcrumbList","@id":"https:\/\/www.inovex.de\/de\/blog\/data-orchestration-is-airflow-still-the-best-part-3\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.inovex.de\/de\/"},{"@type":"ListItem","position":2,"name":"Data Orchestration: Is Airflow Still the Best? (Part 3)"}]},{"@type":"WebSite","@id":"https:\/\/www.inovex.de\/de\/#website","url":"https:\/\/www.inovex.de\/de\/","name":"inovex GmbH","description":"","publisher":{"@id":"https:\/\/www.inovex.de\/de\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.inovex.de\/de\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"de"},{"@type":"Organization","@id":"https:\/\/www.inovex.de\/de\/#organization","name":"inovex GmbH","url":"https:\/\/www.inovex.de\/de\/","logo":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png","width":1921,"height":1081,"caption":"inovex GmbH"},"image":{"@id":"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/inovexde","https:\/\/x.com\/inovexgmbh","https:\/\/www.instagram.com\/inovexlife\/","https:\/\/www.linkedin.com\/company\/inovex","https:\/\/www.youtube.com\/channel\/UC7r66GT14hROB_RQsQBAQUQ"]},{"@type":"Person","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/126d1a5e3c6b844fa0d4708df41e9cfb","name":"Raphael Skuza","image":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/wp-content\/uploads\/cropped-Raphael-Skuza-Profilbild-96x96.jpga7fa52311aba9815836e166521178831","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/cropped-Raphael-Skuza-Profilbild-96x96.jpg","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/cropped-Raphael-Skuza-Profilbild-96x96.jpg","caption":"Raphael Skuza"},"sameAs":["https:\/\/de.linkedin.com\/in\/raphael-skuza"],"url":"https:\/\/www.inovex.de\/de\/blog\/author\/rskuza\/"}]}},"_links":{"self":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/41261","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/users\/318"}],"replies":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/comments?post=41261"}],"version-history":[{"count":6,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/41261\/revisions"}],"predecessor-version":[{"id":44823,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/41261\/revisions\/44823"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/media\/46057"}],"wp:attachment":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/media?parent=41261"}],"wp:term":[{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/tags?post=41261"},{"taxonomy":"service","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/service?post=41261"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/coauthors?post=41261"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}