Ab Februar 2025 Pflicht (EU AI Act): Anbieter und Betreiber von KI-Systemen müssen KI-Kompetenz nachweisen.
Alle Informatonen 
Illustration of two women in front of data visualization
Data Engineering

Data Orchestration: Is Airflow Still the Best? (Part 4)

Lesezeit
20 ​​min

Welcome to the last part of this article series, in this article we want to have a look at Unit testing in Airflow, Prefect and Dagster. It can be helpful to read the other parts of this article series for more context, here you go: Part 1, Part 2 and Part 3.

In Software Engineering testing is so incredibly important, one cannot stress enough HOW important testing is. There is nothing worse than deploying your code into production without testing it beforehand. This can have several reasons, often it is tedious to write tests and also not always so easy to write tests for pipelines. Also sometimes there is just not enough time and the developers are under pressure to deliver pipelines in time in order to meet deadlines. Additionally, developers are not always in possession of quality test data due to PII or other reasons, and creating test datasets can be very time-consuming.

Nevertheless, in the space of data orchestration, testing is often done. In Software Engineering writing tests is natural and this has to establish in the data orchestration space too.

In the following, I want to show you how one could start out writing unit tests for tasks and respectively assets in Prefect, Airflow, and Dagster. Let’s start with Airflow.

Unit testing

Airflow

First of all, we create a directory test inside of the dags directory in which we create a file test_ingestion.py. In order to test our ingestion task, we have to mock out the Postgres connection which is realized by a PostgresHook. This can be done in the following way:

First of all, we create a fixture mock_postgres_connection which will mock the PostgresHook. Mocking can be quite difficult in Airflow but does not have to be. In this case, we are lucky, and mocking is quite easy. Do not forget to use the function method on your task, otherwise, your test will not work. Afterward, we can test whether the target file exists and whether the data that has been written to it is as we expect it to be. Then, you can clean up the test files and we have tested our task!

But testing in Airflow is not always easy, a lot of users complain that testing in Airflow can become quite difficult and most of the time this is true. In order to perform well in unit testing, you have to understand a lot of internal details of how Airflow works. Otherwise, the testing experience can become quite frustrating. This is suboptimal since testing should be easy, the more difficult testing for a developer is, the more they will tend to leave tests out. What are such cases where testing in Airflow can become bothersome or confusing?

One confusion may arise when testing operators in Airflow. Normally, you will use the execute method on them and check if the result matches your expectations. But if you use templated arguments instead, then you cannot use execute anymore and you have to use the run method. This has to do with Airflow’s internal structure when Airflow inserts template arguments. You will also need a metastore in this case.

Regarding the metastore, this can be also quite frustrating if you do not know about this. This is a simple SQLite database by default. The metastore is needed because it contains information about your DAGs, environment configuration, metadata and so on. You can either create the metastore manually via airflow db init or you can also set up a conftest.py file which could contain code like this:

This will do the job for you, resetting a metastore for every test session. Do not forget to clean up temporary files after running your tests, so you probably want to implement a clean-up logic after the yield statement. To finish this off, if you use run in your tests, then you will not obtain a return value – you will have to hack around this.

But to be fair, Airflow encourages developers to separate orchestration logic code from data processing logic. Thus, the idea is that you only have to test the DAG structure and DAG/Task related rules in Airflow and the data processing logic is outsourced and is tested elsewhere. But this has the disadvantage, that it is very difficult to do integration testing since the logic of different tasks can reside in totally different code locations.

Furthermore, it is a common practice to create testing environments and this is also very tedious to setup with Airflow.

My opinion on this is that Airflow makes it too difficult for users to easily test their pipeline. Especially, if you want to test your pipelines locally! Let’s see if Prefect performs better!

Prefect

As usual, we should add the following directory and file to our project structure:

Copy & paste the following code into test_ingestion.py:

We simply have to mock out our psycopg2 connection and execute our task ingest_store_data_from_psql. Note, that we have to use the fn method when we want to execute a task outside of a flow. This is similar to Airflow where we had to use the function method. But in comparison to Airflow, we do not have to prepare anything, we can simply run the tests on our local dev environment. It is very easy to write tests in Prefect! Next comes Dagster.

Dagster

Dagster has already created a test directory for us, so let’s create a directory ingestion inside of it. As well, as a file named test_postgres.py inside of the newly created directory. Note, that Dagster has already created a file test_assets.py for us but I usually do not use it except when doing fast smoke tests. So you can ignore it for the moment.

Unit testing in Dagster is slightly more involved than in Prefect due to its more complex code structure. To start out, we will write out the fixtures which we will utilize:

The first fixture mock_data is just defining our test data which consists of two records. The second fixture get_contexts is initializing context objects for us. If you remember, the IO manager accepts context objects as an argument, so these will be used for them.

In our case, we will segment our test into 3 steps. First of all, we will test if the asset is running without problems and returning to a success state. Afterward, we can check whether our IO manager correctly stores the data. Finally, we also test if our IO manager loads the correct value for the downstream assets.

Testing the state of an asset can be done in the following way:

We simply mock our postgres database connection and return our mock data. Then, we have to use the function materialize_to_memory in order to materialize our asset ingest_store_data_from_psql. Note, that we can pass a mock object as our postgres resource, this is called dependency injection and is incredibly useful in testing.

As a second step, we test our IO manager output handling via the following code:

We have to pass as an argument to our IO manager postgres_io_manager an InitResourceContext. Afterward, we can call the handle_output function with two arguments, an OutputContext and an object which is our mock data. Then we can check whether our target file exists and if the data matches our expectations. Finally, clean up the test directory and test file. That’s it.

As our last step, we want to check whether our IO manager loads the right input to our downstream assets. This is a short and quick test, here we go:

Final remarks about unit testing

Dagster and Prefect do an excellent job when it comes down to testing, it is really easy to test your pipelines and you don’t have to prepare your tests. Airflow on the contrary can be very difficult and tedious to test. Airflow is a master of confusion and can steal a lot of time from you when you do not know why Airflow does not work as you might expect otherwise.

Rating: the time has come

Since we are at the end of our pipeline adventure – if you remember, I wanted to rate Airflow, Prefect, and Dagster in these categories: Setup/Installation, Features, UI/UX, Pipeline Development Experience, Unit Testing, Documentation, and Deployment. Do not forget that we only rate the open-source versions of each tool.

Setup/installation

Several options exist when we want to set up any of the 3 tools. For example, we can use Docker, pip with constrained files, helm charts, and installation from source for Airflow.

Dagster can be set up with Docker, pip, or helm charts.

Prefect can be set up with Docker or pip/conda. Additionally, Dagster and Prefect offer a cloud offering. Companies like AWS, Google, and Astronomer offer managed Airflow instances. Dagster and Prefect are easier to set up and install but Airflow offers more options.

Thus, I would rate all of them 4 out of 5 stars.

Features

Airflow allows users to programmatically code their workflows. Plus, Airflow also offers a web UI. Additionally, the community coded up a lot of plugins and provider packages. Airflow comes also with Access Control and Connection Management.

Dagster on the other hand offers asset management and asset lineage and is more innovative. A lot of new concepts are at our disposal like ops, graphs, jobs, and IO management. We have also seen the power behind resources. Dagster comes also with a web UI which loading time and user experience are amazing. Furthermore, we can backfill via the UI and can utilize Dagster Types for type validation. But Dagster is not perfect, it lacks features like Access Control or Secret Management in the open source version.

Prefect offers the concepts of tasks and flows. We can even create sub-flows if we want. Moreover, Prefect follows a pydantic first approach, we can validate all of our inputs and outputs with Pydantic, easily. We can choose between different Task Runners, have the Orion Web Server at our disposal, agent & queues for deployments and deployment configuration. Plus, we can use blocks in order to store configurations or secrets on external storage systems. But there is no integrated secret management in the open source version.

In my opinion, Dagster has the most innovative concepts and I think that these concepts will indeed establish in the long term. It has the potential to surpass all other tools. That is why I rate Dagster 5 out of 5 stars, although it is lacking some features. Airflow is the long-established champion with a lot of features under its belt but Airflow’s abstractions and features are rather old-fashioned. That is why I rate Airflow 4 out of 5 stars. Prefect comes with fewer features and is also not the most innovative tool. It still is a great tool since it is the easiest one to use. So I rate it 3 out of 5 stars.

UI/UX

Airflow’s UI looks old-fashioned by now. It is lacking a cleaner design. Prefect’s UI is modern, it is clean, and minimalistic. Dagster is similar to Prefect, clean, modern, and very responsive.

The list of DAGs in Airflow is okay. But since all the DAGs are thrown into the same list, I would wish that one can assign and sort them by groups for a better overview. It would be also great if Airflow would change the execution timeline to a more granular view. If you click on the Gantt tab, then you are able to see when a task started and ended. But on the other hand, it is similar to Prefect’s flow overview where one can see how long each run approximately was and if there are clear outliers visible. Furthermore, the graph view is good enough and we can inspect the graph structure very quickly. I also like that you get more information on the runs when you hover over the graph nodes. In general, Airflow would just need to modernize its UI and re-organize the structure of the UI elements.

Prefect’s UI is really good-looking and clean. You can navigate quickly through a flow and inspect the task runs. What I would change though, is the position of the radar diagram. When you are new to Prefect, you might not click on it because it looks relatively unimportant since it is so small. I would create a dedicated sub-tab for the radar diagram.

When it comes down to the web UI, I would always choose Dagster’s web UI since it is the most intuitive one to use.

In the end, I would rate Airflow’s UI 3 out of 5 stars, Dagster 5 out of 5 stars, and Prefect 4 out of 5 stars.

Pipeline development experience

The learning curve in Airflow is very steep. You have to get familiar with hooks and operators. You need to understand how to structure your code and what a DAG definition should look like. There are many pitfalls in which you can fall, e.g., top-level imports, exchanging larger data with XComs, or using the wrong position for variables.

Dagster’s learning curve is also steep. You need to learn a lot of concepts like assets, resources, and the IO manager. But once you have understood how these concepts work, I felt that I got more productive. I wrote my pipeline 2 to 3 times faster than in Airflow because debugging is so much easier and there are not as many pitfalls. Furthermore, I like that every pipeline in Dagster is packed in separate packages, thus being more isolated.

Prefect is also great, I call it a lunch tool since you could learn it over lunch and start implementing your pipeline. When you already know Airflow, the transition should be quite easy. Also, since Prefect makes it easy for you to develop it locally, debugging is really easy. Creating a deployment configuration in Prefect is also a breeze.

Thus, I rate Prefect 5 out of 5 stars, Airflow 3 out of 5 stars, and Dagster 5 out of 5 stars.

Unit testing

Unit testing in Airflow can become quite difficult. Often you need to know how Airflow works internally in order to mock out some functionalities. Also, it might be confusing for you that you have to initialize a metastore for some tests and for other cases not. Furthermore, inconsistencies like using the operator’s execute method in some cases and the run method decrease the testing experience.

Since local development is really easy with Dagster and Prefect, testing is also easier. Also, Dagster’s resource dependency injection makes testing really easy. But Prefect is also easy to test since you can structure your code in such a way, that it is easy to test.

Thus, I rate Dagster and Prefect with 5 out of 5 stars and Airflow with 3 out of 5 stars.

Documentation

Airflow’s documentation contains already a lot of information but it is lacking details. For instance, the section about testing is really short but testing deserves definitely more space. But since Airflow has the biggest community, you can almost always find the necessary information on StackOverflow. By the way, I bet you have never seen StackOverflow’s actual homepage!

Dagster’s and Prefect’s communities are also fairly large by now and you can also find a lot of information but less than in Airflow. Moreover, Dagster’s documentation is quite verbose but comprehensive examples are lacking on how to effectively utilize Dagster’s new concepts.

Prefect’s documentation is almost ideal. I didn’t need any other sources than Prefect’s documentation. Since Prefect doesn’t offer too many functionalities, it is also easier to write the documentation. Thus, it is understandable that Dagster’s and Airflow’s documentation has to improve since they offer a lot of features.

Thus, I rate Prefect with 5 out of 5 stars, Airflow with 3 out of 5 stars, and Dagster with 4 out of 5 stars.

Deployment

If you want to use managed services, then Airflow has some to offer like AWS MWAA, Google Composer, or Astronomer. It is really easy to deploy Airflow with these services, although they can cost you a lot! If you decide to deploy Airflow yourself, then it is more difficult, you can either use the helm chart or do it completely yourself.

Dagster also offers a helm chart or you can also do the setup completely yourself. But you can also use Dagster’s Cloud offering if you want a managed service. Recently they launched a serverless service. But whenever choosing a managed service, the cost will be always a factor to think about.

If you want to deploy Prefect’s open-source version, then you have to take care of this yourself. But Prefect has also a Cloud offering which they portray as “Coordination as a Service“.

In my opinion, the deployment is not a reason to choose any tool over the other and thus I would rate all of them with 5 out of 5 stars.

Rating results

Final Rating of Dagster (32), Prefect (31), and Airflow (25)
Figure 1: Final Rating

And the winner is .. .Dagster! In my opinion, Dagster has the highest chance to surpass the two competitors but Dagster has to watch out! Prefect is a serious competitor and the outcome will depend on the future development of both tools. Depending on the team expertise and environment you are working in, Airflow might be still a good choice! And we should also not forget that there are far more tools out there like Argo, Flyte, Apache NiFi, mage-ai, and so forth!

Future trends and final remarks

What comes next in the future? I think the following 4 factors will determine which data orchestration tool will establish itself in the long term: Productivity, Observability, Data Governance, and Deployability. In my opinion, Dagster and Prefect are on a good track when it comes down to productivity. Dagster and Prefect’s open-source version still lacks some data governance but maybe we will see more in the future. But Airflow will also need to continue improving its data governance. Observability will be also very important in order to guarantee a high-quality data infrastructure.

If you have your own opinion about any of these or other tools, then do not shy away to reach out!

I hope you really liked this article and that you could learn something new!

Hat dir der Beitrag gefallen?

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert