In this blog post I would like to talk about Mender, a software solution we at inovex evaluated in the Smart Building project, which allows robust and secure OTA updates for IoT device fleets. I will give you an overview of the software and show you how we integrated it into our own project.
Introduction
In the Smart Building project, IoT edge gateways are used as intermediate devices between microcontrollers and Microsoft Azure. They allow pre-processing of incoming data before sending them to the cloud for further functionality. If you want to know more about the Smart Building project in general, you can read the introduction article.
Those devices run a custom-built image created with Yocto by our development team. The image contains all components required to operate the edge gateway, for example, Docker, systemd services, configuration files and the mDNS software Avahi. It also includes the Mender client which I will discuss further in a moment.
Bugs, security issues or misbehaving software is common in IT projects and the ability to fix those problems is a requirement for modern embedded systems. Furthermore, there should also be the ability to add new features after the deployment at an edge location where the operators do not have physical access. To accomplish that, we need to make sure we can deploy changes via remote software updates or over-the-air (OTA).
The edge gateways are not directly accessible from the Internet but only within the local network. That is why it is not possible to simply connect to the device from a central system and install updates for all office locations, e. g. via SSH (which can be very error-prone nevertheless). The solution for that is – you might have guessed it – Mender.
About Mender
Mender enables “secure, risk-tolerant and efficient over-the-air updates […]“ according to their website. Managed edge devices run the Mender client and connect to a central Mender server. They poll that server in regular intervals and download software updates over HTTPS when available. Using this pull-based principle, the edge devices do not need to be directly reachable from the Internet because the clients initiate the connection themselves.
The clients are authenticated via an asymmetric key pair generated on the first boot of a device. Devices have to be accepted once via the Mender Web UI or Mender REST API (which we automated via the Gateway Provisioning Service) in order to register with the server successfully.
A system update can then be deployed via the Mender UI (or REST API) for only specific machines or a whole device fleet. They will then download and install the update. We will see where these update files come from and how they work later in this article.
Update mechanism
Mender supports different types of update mechanisms: full system updates which contain the root file system, or just specific application updates. In general, devices have a special partition layout consisting of two main partitions, one being active and one being passive, a bootloader partition and another one for persistent storage. This enables automatic rollbacks on update failure using the A/B update mechanism.
The main partition contains the whole root file system and the kernel. The active partition is the one the system boots from and has a working file system. When a system update is installed, the new root file system is copied onto the passive partition. The device then performs a reboot and tries to boot from the passive partition with the new, updated files. In case that fails, the device simply reboots again and boots from the active partition like before. Otherwise, if the boot was successful, the pointers for the active and passive partitions are swapped so the updated partition is marked as active.
That way, corrupt or buggy system updates do not render the device unusable, which would require manual re-setup. A faulty update just makes the device reboot from the working state again and allows the operators to remotely install another update. In addition, it also prevents incomplete updates, e. g. when the update process got disrupted due to power loss.
Update files
Update files are deployed as Mender-specific file archives, also called Mender artifacts. In our case, they contain the whole root file system, but application updates are possible as well, e. g. by installing Debian packages via dpkg. You can actually create your very own update module which executes exactly the code you need to update the system. With that, you could only update containers, just specific files in the file system or the bootloader. The possibilities are endless.
Mender update files are compressed when created and extracted on-device to save bandwidth, after being downloaded by a client. They include metadata, like the updated version, designated device type (hardware platform, like Intel NUC, Raspberry Pi etc.) and an optional cryptographic signature (more about that later).
Yocto integration
Mender natively supports Yocto, a software project allowing developers to create custom Linux images for any hardware architecture. Yocto is especially popular and known for targeting embedded systems. It allows integrating software into the OS by using layers. They basically contain various software components, similar to a package manager. The Mender project also releases different Yocto layers for easy integration of Mender into the custom OS image.
It handles the installation of the Mender client and the correct partitioning as well, reducing the developer’s work required. As you might know after reading the previous blog article about the Gateway Provisioning Service, we also use the software stack consisting of Yocto and Mender. On one hand, that is because of the feature set and robust OTA updates provided by Mender, but on the other hand, exactly because of the good integration Mender provides with Yocto.
Security
Mender also offers the option to create a cryptographic signature for Mender artifacts using RSA or ECDSA key pairs. This enables verification of updated files on-device before installing any update. The option can be enabled in the Mender client config by providing a path to a public key which will be used for verification of signatures. All update files must then be signed by the corresponding private key. That feature is ingrained into the OS; the Mender client refuses to install updates with none or invalid signatures. For example, a compromised Mender server does not allow an attacker to deploy arbitrary update files because, without the private key, he cannot create a valid signature.
As another security feature, you can toggle the feature flag „read-only file system“. When building the OS image using Yocto, the Mender layers configure the currently active main partition to be mounted as read-only, so processes and users are unable to tamper with any files in the root file system, assuring file integrity. Of course, you should be aware that the root user can remount the file system as read-write.
Releasing system updates
We have implemented an automated process of how system updates are released and deployed onto devices. The whole pipeline is defined using GitLab’s CI/CD features in a .gitlab-ci.yml file. The process starts with a push of new commits to our Git repository. GitLab triggers a build of the image with Yocto and outputs the Linux image itself and the Mender artifact. These files are saved as GitLab artifacts so they are available for manual download afterward.
As an intermediate step in the pipeline, the Mender artifact is cryptographically signed with a private key (mounted via GitLab CI variables). The corresponding public key is embedded into the device’s image from the factory so devices can verify the update’s integrity after downloading the file. That way, we can ensure that only trusted and verified system updates are installed on all devices.
Tagged commits are considered to get deployed onto one or more devices at a later point in time. If a tagged commit triggers the pipeline, the Mender artifact is automatically uploaded to the Mender server so it is ready to be deployed to targeted devices via Mender’s Web UI. The name of the artifact, which is displayed in the UI, is set to the Git tag. Mender artifacts produced by non-tagged commits are not uploaded to Mender and are just stored in GitLab itself.
Infrastructure
The Smart Building project’s Mender instance is hosted on Kubernetes in an Azure Kubernetes Service cluster. At the time we began building the infrastructure in late 2019, the Mender project only distributed Docker-Compose files for deployment of the Mender server, so we had to build the Kubernetes manifests ourselves from the ground up.
In the meantime, the project started shipping Helm charts for the installation of the Mender server on Kubernetes. Since Mender 3.0 (released in July 2021), the Helm charts are also mentioned in the docs and are now considered the only production-grade installation method for the Mender server (also see this commit).
Since Mender and the Gateway Provisioning Service share the same AKS cluster, the Mender server also benefits from the nginx ingresses with an AKS TCP load balancer in-front, automatic TLS certificate management from Cert Manager and DNS management via ExternalDNS.
Conclusion
Mender quickly became a central component in our infrastructure because it offers hassle-free, secure and robust OTA updates for different types of device fleets.
Its adoption by companies like NVIDIA or leading cloud platforms like Microsoft Azure and Google Cloud Platform emphasize that Mender is able to provide enterprise-grade features for potentially large-scale production setups. Additionally, the software’s footprint is still small enough to make it reasonable to maintain it for a smaller number of devices as well.
To summarize, if you need a solution for easy and secure OTA updates for a group of (preferably) IoT devices, you should definitely take a look at Mender.