A year ago, I finished my master’s thesis on Zircon, the microkernel which is developed by Google for their mysterious new open source operating system: Fuchsia.
Today, there is still not much known about the intention for developing Fuchsia and Zircon nor is it known where the system would be used in the wild. But as we see, it is still actively developed and has just opened for third-party contributions.
As I missed writing a blog post right after my thesis, I felt now is a good time to catch up and have a look on how things changed since back then. And at first sight a lot changed. When I started, there was a Github mirror for the project which I had cloned to work on a stable code version during my thesis. Today it is obviously outdated, but it is a very nice point to start.
As I’m an embedded and kernel dev, my work was focused on Zircon, Fuchsia’s microkernel, the way it works and how it influences driver development in contrast to Linux. Correspondingly, I will focus on the same main topics today. So let’s start with a general overview on the concept of the Zircon microkernel and its drivers.
A Brief Introduction to Operating System Designs and History
As already mentioned Zircon is a completely new approach in operating system development. Not that much in terms of the concepts used, but rather in their realization. The Zircon core team is made from developers who were already involved in some new operating system designs like especially BeOS but as well experienced in Linux. Starting a new kernel from scratch today with all the learnings they made in the past shows clearly in the resulting design and makes it definitely worth having a deeper look inside.
Zircon is—in contrast to Linux—a microkernel design. Or at least a kind of. A new entry in the project overview page denies this fact while other places speak about a microkernel nevertheless. In fact, Zircon is mostly a micro kernel but not a strict design by textbook. It mixes concepts and learnings to create a very interesting operating system kernel concept which has enough microkernel parts to treat it like one here.
That means the actual operating system kernel (the real Zircon part of Fuchsia) is lightweight. It contains only a reduced set of privileged core functionality to make the system work. These general parts are at least a scheduler, inter-process-communication (IPC) and a synchronization mechanism in a textbook micro kernel. Depending on literature and the actual system design, some microkernels see memory management and maybe other components as a core feature, too. Everything else like networking, file systems and different driver types are part of the user space in a microkernel design and run as unprivileged, self contained modules. That’s the actual Fuchsia part of this system, together with common user applications.
As mentioned this division is in terms of privilege levels as well. Zircon, the actual kernel, runs in the so-called kernel mode which refers to the CPU ring 0, the highest CPU internal privilege mode. Some operations, for example accessing physical hardware, are only possible in this mode. Fuchsia is the user space which runs in user mode and CPU ring 3. To make a working operating system from these components, an efficient IPC mechanism is needed to connect them and make a userland driver capable of controlling hardware via well-defined system calls to the kernel which is in fact the component doing the action. An efficient IPC mechanism is one key point in a microkernel operating system design and one reason why a lot of well known older operating systems like Linux are based on the opposite, the monolithic kernel design.
The Linux kernel bundles nearly all needed components for an operating system in its kernel itself. Process and memory management, file systems, networking and device drivers are as well part of it as scheduling and inter-process-communication. This brings some advantages but as well disadvantaged in contrast to the microkernel design. A long lasting and big advantage of this design is a performance plus. As the Linux kernel bundles all its components in one kernel and thus one process space communication within the kernel is super fast and cheap in terms of resources. The components in the kernel can use each other without additional effort and drivers can access hardware directly. That’s sometimes a big plus, especially in terms of performance but not from a security point of view. Components in a monolithic kernel are not protected from each other, sophisticated process isolation concepts do not apply within.
A microkernel needs expensive IPC between components, compared to direct memory access in monolithic kernels and even an efficient implementation is slower. Zircon defuses this situation to a certain degree as it bundles different component groups running in one process each to reduce the cost for communication within this context. The bundled components are isolated from each other. This adds an additional security layer as a corrupted component can not compromise another one. Rating this aspect more important than performance is not that wide-spread. When this architectural decision was made for Linux and most other major general purpose operating systems we know today, the loss of performance was a much more significant factor than the gain in security and other positive aspects for this architecture. Most existing microkernels systems like QNX for example, are a less complex architecture which makes the design maintainable more easily and less error-prone while the remaining errors are less serious as components are isolated against each other. By now, we have much more powerful machines and more sophisticated concepts to implement a microkernel architecture with a significantly smaller performance drawback while security as well as the other advantages become more and more important and move the focus for new kernels clearly towards the microkernel architecture.
Zircon’s Driver Concepts
A device driver is a special piece of software which is intended to provide an abstraction layer between a device, mostly some kind of external hardware, and the system. It should provide some kind of standardized interface to access different hardware implementations of a device type in the same way. A user should not need any additional knowledge on the specific device. The driver must initialize the device, interpret user requests and convert them into more complex device instructions. In general, drivers are commonly written as so-called „stacked drivers“, that means there is a device-specific part which must be written per device or device family and a more general part which implements the more high-level driver-side once, e.g. for networking or bus protocols. This procedure avoids code duplications and errors.
What makes drivers so special is the kind of actions they perform. Accessing physical devices requires accessing physical memory addresses instead of the virtual ones used within normal applications, address mapping and further actions from a CPU’s privileged instruction set. Thus, drivers are a part of the kernel and run in this privileged mode in classical monolithic architectures. This makes the concept of drivers a bit more complicated in micro kernel mode where drivers are a part of the user-space. Of course, all privileged actions such as I/O operations or accessing physical addresses stay privileged. The way they are handled within the CPU does not change. That means at least performing these particular actions must stay a part of the microkernel, even if the driver itself resides in user-space. Instead of doing these things directly as part of the driver, a microkernel architecture provides a defined set of system calls to trigger them inside kernel space. The reduced kernel implementation fulfills the requested actions and hands the control back over to the user-space driver. Of course, this leads to much more communication effort and is ways less efficient because each switch between user-space and kernel needs an expensive mode switch, but on the pro-side, the kernel is able to validate each request and check if the action is legit for the calling instance which provides a plus on security.
This well-defined interface defined by the system calls enables another interesting fact. It decouples the programming languages used within the kernel from the ones used in user space and in particular those used to write drivers. Linux as a typical monolith is very specialized in this point. The kernel and everything within are written in plain old C using a custom LibC implementation, mostly for licensing reasons. On the other side, Zircon provides more language freedom. In my thesis, most drivers were written in C but an increasing number in C++ and the transition was ongoing. Today, nearly all drivers are in C++ which seems clearly preferred by now. But at least the possibility of using (currently) two languages and a design to support more in the future is worth mentioning in comparison to Linux.
All in all, the way drivers are written does not differ that much between Linux and Zircon. Driver life cycles and thus much of the implementation are similar between both worlds.
There are new names and one has to deal with minor documentation on things like how the binding between a device and its driver is done, which complicated starting with Zircon drivers a bit back then and probably still does now, too, but that’s partially not much better with Linux. Except for one thing: FIDL. Instead of ioctl commands on the Linux side, the Fuchsia/Zircon side gets driver commands by a mechanism called FIDL which means Fuchsia Interface Definition Language. Let’s have a closer look!
AIDL, HIDL, and now FIDL? No worries! It is not as bad as it is with Google and build systems. Communication between userland and kernel/drivers has always been a pain point on Linux. File operations but also procfs and sysfs are not flexible enough for more complex use cases but ioctl has a lot disadvantages in use, for example is it not checked at all. That means if the size of data transferred between user and driver does not match the definition, there is no built-in mechanism to avoid issues due to this situation. The developer needs to check for it. That’s very insecure and error-prone and exactly the point for FIDL to slide in.
Different from other concepts used, Fuchsia/Zircon does not enhance the ways to communicate between userland and driver with FIDL, it completely replaces the ioctl we are used from Linux. FIDL means Fuchsia Interface Definition Language and of course, interface definition languages are not new at all. As hinted above, Google takes large advantage of them in Android on different levels. Within Fuchsia, they bring the use of an IDL to another level as FIDL is used for each kind of inter-process communication. That means, each time communication between two processes is needed (and that’s a lot within a micro kernel), the interface is specified using FIDL.
There are a lot of good reasons to do so like efficiency, determinism, robustness and so on, but from a Linux kernel developer’s point of view, the true genius take is using FIDL to replace the ioctl hell completely. It enables us to generate a strongly typified, well-defined and very specific interface between user applications and the driver. And as a bonus of FIDL and the microkernel design, a bunch of programming languages became thinkable for driver development on top. FIDL supports bindings for different languages such as C, C++, Dart, Go and Rust. They are all suitable for userland already and in contrast to Linux, drivers can be written in C, C++ and for currently at least high-level drivers, Rust is an option.
When thinking about the number of projects Google started and buried after a rather short time, it is nice to see Fuchsia and Zircon still alive and in active development. There is still no official statement on the plans Google has for Fuchsia, instead they just opened their open-source model from read-only to accepting third-party input (1, 2, 3) and published a roadmap. That’s not what a dying project looks like. And even if Google still does not tell us their exact plans for Fuchsia, we can find some hints in the sources. The most interesting is the Fuchsia Products page which contains information about product configurations that can be built. It mentions workstation, speaker and router. Maybe we will have Google/Nest smart speakers with Fuchsia in future.
Of course, there are a lot more changes under the hood of Fuchsia. When I came back to it a year after finishing my thesis, my first thoughts were „Where did the source go?“ and „Where did the drivers go?“. And yes, the whole structure changed. The Fuchsia code moved completely to Google Source and established a more Android-like repo structure and a general refactoring of the code structure. Back then, during my thesis, drivers were located within Zircon, that means within the kernel. But in the microkernel design, they were not directly part of the kernel. Today, this logical issue is solved by moving them into the top level user space—Fuchsia.
A nice example is GPIO code. Besides other system/driver code, it is located under fuchsia/src/devices/gpio and further divided into a drivers directory bundling gpio implementations for different chipsets and boards and a bin directory which contains a user-centered gpioutil which seems to be a tool to control gpio from user-space and is—of course—based on FIDL. Other device classes within the directory use a similar structure depending on their type and use. Mostly, there is rather a lib dir which contains a developer-facing library than a binary application such as gpioutil. All in all, this structural change is a logical conclusion to make the microkernel structure more visible in source.
Furthermore, when looking around and searching for the drivers, I found hints on device tree and dtb files within the Zircon sources and first, I was pleased. Device Tree is a data structure which is (mostly) used in Linux to describe the hardware structure of single board computers based on ARM and other architectures. As Fuchsia used so-called board files to do this task, just like Linux did previously from switching to device tree and so the first thought was „Oh, they learned from Linux and how board files failed there“. But unfortunately, a closer look into the available device trees made me skeptical. The files were way too small and not informative at all. Searching more, the board files showed up again in a new location within the Fuchsia part just as the drivers did.
Nevertheless, to be honest, at the current state of the project board files are completely fine. Fuchsia currently supports something around 15 different boards. For this amount, board files are better to handle, easier to implement and do not run into the scaling issues Linux had when they switched to device trees. The reason for adding the device tree blobs anyway is as annoying as it is simple. Some bootloaders are looking for the DTBs in order to boot a system image and according to the hints within the code, that’s the only reason for them to exist currently. But maybe Fuchsia will switch as soon as the projects grow and they run into similar issues as Linux did.
To stay with rather annoying news, remember the build system chaos, Google had with Android? They recently changed the build system repeatingly within the last Android version, mostly with two of them used in parallel which led to some issues as the function set was not necessarily identical and converting build files is a lot of effort. Surprise, they do it with Fuchsia, too. And to make it even more funny, it is yet another build system named GN which is a meta-build system on top of Ninja. But to be honest, GN was already there during my thesis, but only within the Fuchsia part. In the meantime, they moved completely to GN including for Zircon. At least it is only one build system left in Fuchsia. Hopefully it stays that way …
To proceed with the nicer sides, let’s have a look at the emulator. There is visible progress as well in the form of a working emulator with a very rudimentary graphical interface. When starting my thesis, the emulator had only a rendered terminal and the user interface running on real hardware (in this case the Pixelbook) was indeed prettier than the new one in the emulator, but less functional.
Changes in driver development
Driver development in Fuchsia respectively Zircon was the main issue I focused on during my thesis and it should not be neglected here. In general, a lot changed within the year since my thesis but the changes for driver development are really neat, starting with the documentation. Today, there is so much more and better documentation available which should facilitate starting with driver development enormously.
Besides the documentation, the fx create tool (1, 2) became very helpful as it creates scaffolding for new software projects and now even for drivers. It supports user space components in C++ and Rust as well as drivers in C++. By this, a developer gets a helpful basis while the Fuchsia Team ensures new drivers use the currently preferred language and structure incidentally. That’s a huge progress as during my thesis drivers had some kind of uncontrolled growth. There were C and C++ drivers and wild mixtures between both while there was no documentation which of the already available driver structures and styles is preferred. Today, most drivers seem to have been migrated to the new style as predetermined by fx create including switching to GN build files, pure C++ and a reworked style of driver binding. Additionally, fx create puts a file for unit tests in place and most available drivers take advantage of them. That’s new as well and another contrast to Linux where no general concept for unit tests for drivers exists.
Using fx create and the move of all drivers to the Fuchsia side brings developers to a reworked style of driver binding. That’s the mechanism which matches a device or device class to a specific driver. Previously, this was done with a C style macro during my thesis, but now as the migration to Fuchsia seems to be complete, the new style uses completely new introduced .bind files with their own, easy to use description language. Within the actual driver code, the only thing that remains from the binding mechanism is the inclusion of the .bind file. Previously needed boilerplate code is removed and the actual mechanism which decides if a driver matches a device is even more hidden than before. Nevertheless, that means as well C++ or (later) Rust drivers no longer need a single C file just for binding.
As already suggested, C++ is the preferred language for writing device drivers in Fuchsia right now. Older C-styled drivers seemed to be completely replaced and fx create does not support adding new ones with tool support, either. In the future, Rust will probably be supported by fx create and drivers would rather use it over C++, but today only very few stacked high-level drivers are built in Rust.
Conclusion and What We Can Learn from Fuchsia
Even today, more than 4 years after Fuchsia appeared in public, we do not know anything concrete about Google’s plans for it. There are some hints and guesses but not a single official statement. That’s tough for a project that has been in active development for such a long time, but even if we do not know when, in which form and for what purpose Fuchsia will come, it is a very interesting new approach for an operating system.
All in all, the concepts behind it are not necessarily new, microkernel designs for operating systems, for example, have been known at least since Tanenbaum. However, nearly all bigger general purpose operating systems are based on monoliths, mostly for historical reasons. But that’s not the only reason why Fuchsia is worth having a deeper look. There are lots of small and clever ideas that take place in Fuchsia which make it absolutely worth having a look into the system internals.
And even if we never get to see Fuchsia in a product, it shows how a radical new operating system design based on a microkernel could look like. Some of them are very radical and I do not know if there is even a chance to prevail such as completely dropping POSIX compatibility, but on the other side there are some nice and some really genius approaches within Fuchsia I would appreciate to see in Linux or similar.
My personal top 3 list includes:
- open driver development in other languages than C
- establish unit tests in drivers
- and my personal favorite and the idea I would love most to see in Linux: completely replacing ioctl with something like FIDL