As computer technologies are progressing rapidly, there is always space for improvement and optimizations. New functionality gets developed and released which, in retrospect, sometimes would have changed some design decisions. Today, I would like to take a look back at an IoT project at inovex, considering later features in the project’s software stack.
In the first chapter, I will give you a short introduction to the inovex IoT Smart Building project which will serve as an example. The second chapter then outlines later features in the project’s software stack allowing to reconsider some architectural decisions, while the third chapter concludes the article with some ideas on how to embed custom low-level software into commodity hardware.
The architecture of inovex Smart Building
The goal of the Smart Building project at inovex is to develop a room occupation system showing which meeting rooms are currently occupied (not a status in a calendar, but the real state). Additionally, for Covid prevention, the system will inform meeting participants when ventilation is required.
There are already a few blog posts about the project, describing the custom-built software easing initial device deployment and the over-the-air update system built with Mender. You should give them a read if you are interested in more details.
But to give you an overview, inovex Smart Building contains multiple microcontrollers with multiple sensors, e.g., a CO2 sensor, and one gateway device per office. Microcontrollers send their data to Azure but can be filtered or analyzed locally by gateway devices.
The gateways run a custom Linux image developed with Yocto. To enable OTA updates for gateways, Mender is included in the image. Initial device provisioning is handled by our own custom software.
Kubernetes & CI/CD
Our custom provisioning software contains a server and a client, where the latter is run on every gateway. The server component lives in a Kubernetes cluster hosted on Azure Kubernetes Service and includes a TCP load balancer, nginx reverse proxy with TLS termination, cert-manager for automated TLS certificate management, as well as the software itself. All data processed by the software is ephemeral, so no database is needed. It registers the requesting gateway with other services used in the project, namely Vault (for new device certificates used for Azure-related authentication), Azure IoT, and Mender.
New versions of the custom Linux image are built with Yocto in GitLab pipelines on every push to the master branch (which, in the meantime, should be called main). Production and development images (where the latter is every image pushed to branches not being master) also have different components. Dev images contain additional debugging tools like an SSH server, allow root logins via SSH without a password and enable post-install logging.
As Azure IoT is used, all gateways run the Azure IoT agent. Because there was no real support for Azure IoT and Yocto at the time, we decided to build our own container image including Azure’s IoT agent. The gateway’s container runtime in the Linux image then runs the container image.
Essentially, the container image is based on an Ubuntu image, uses Microsoft’s APT repository to install the Azure IoT agent, and also configures the image so that configuration options (especially authentication credentials for Azure IoT) can be injected via environment variables at runtime. That way, a gateway fetches its authentication credentials via our provisioning software, stores them on disk, and transfers them to the container via environment variables. Thanks to this approach, the container image, and the Azure IoT config can stay generalized and will be configured on-device when necessary.
Introduction of new features in Mender
The previous part of this article described the implementations in the inovex Smart Building project. Since then, the integrated software components have evolved.
In this part, I would like to revise our design choices. For that, the focus will be laid on Mender as it introduced new features directly impacting the possible solutions available for our project. The work on the Smart Building project began in 2019. Since then, Mender has introduced multiple new features which now allow reconsidering some design choices. I would like to cover two particular features: IoT Hub integration and Remote terminal.
Azure IoT Hub integration
First, Mender version 3.2, released in February 2022, added integration for Azure IoT Hub. As you might know, Microsoft’s Azure IoT Hub service enables managing your IoT device fleets. For example, one can pre-process data sets before they are sent to the Azure cloud for more in-depth analysis and long-term storage. IoT Hub also allows managing IoT device states using the so-called Device Twins.
To add a new device to IoT Hub, the device needs to be configured and authenticated with a symmetric key. While this process can be done manually, it is not feasible to do it this way, especially if you need to maintain tens or thousands of devices. To automate the initial provisioning process, we implemented the required steps (i.e., generation of two symmetric keys, deployment of these keys onto the device) in our custom provisioning software. Of course, this added some complexity to the whole architecture. Besides that, the software also registers a new device with the Mender service and also fetches some required certificates from our Vault server.
The new Azure IoT Hub integration of Mender takes over (most of) this process. The registration of devices in Azure IoT Hub can now be done via Mender itself. After setting up your IoT Hub in Mender, it takes care of the communication with Azure and registers the device with the configured IoT Hub (if requested). To configure the device for usage with IoT Hub, Mender uses the already-established secure channel to the device.
Mender also synchronizes the device’s lifecycle, meaning if a device gets rejected in Mender, this state will also be reflected in IoT Hub, where the device will then be disabled.
Not only is the provisioning and lifecycle management done by the software, but also the Device Twins can be managed within Mender. Changes are represented in IoT Hub, thus reaching the device.
In summary, Mender can now handle most of the process we built custom software for, to initially provision new gateway devices. This would allow us to migrate that process from our own software (which requires regular maintenance, development-wise, and infrastructure-wise) to software we use anyway.
Mender devices from inovex Smart Building include an OpenSSH server to allow remote connections. While the general thought was to remove that component once the hardware/software was stable enough, being able to inspect the current device state can be helpful, especially in the early developmental stages. That way, troubleshooting can be eased.
However, adding another component broadens the potential attack surface because an additional server is run which opens another port on the device. In addition, there is also an overhead of maintaining the authorized keys (which would again require system updates when keys get added/removed/updated).
So this is where the new Remote terminal feature of Mender comes into play. Well, the feature was introduced with Mender 2.7 which was released in April 2021, but allow me to call this a new feature since it was not present when we designed the project’s architecture in 2019.
What the Remote terminal allows you to do is to open a shell on the device itself via the Mender interface. So no matter where you are physically, as long as you have access to the Mender UI, you can deploy a shell on any Mender-connected device.
The advantage is that there is no need for an SSH server anymore – no more management of authorized keys, and, most notably, no open ports! The device securely communicates with the Mender server via a TLS-encrypted connection initiated by the device itself. Instead of managing SSH keys on all devices, you only need to manage access to the Mender server.
Obviously, the Mender server then acts as a central gate to all your precious IoT devices, so you really want to make sure that you have strict access control and other security measures in place.
Remote terminal would be a welcomed change to the architecture because it minimizes the possible attack surface of the IoT devices as well as the management overhead for us.
Getting the system image onto hardware
Now, I wanna talk about the problem of how to get the final system image onto the target device. This turns out to be a pretty tricky task.
In inovex Smart Building, we came up with the most straightforward solution: simply copying the final image onto the target disk via USB. This is probably the easiest approach, and because it is a developmental project, also appropriate.
But thinking of a bigger IoT project, this method can quickly lead to bottlenecks. It should be obvious that manually attaching every new device to power and USB in order to copy the image does not scale well with hundreds or thousands of devices.
So let us take a moment to think about this. We would like to prevent manually copying the software onto every device. It would be optimal if we could directly flash the software onto storage. However, since we used a simple hard drive for that, we would still need to attach the hard drive via SATA/USB.
Installation via network
One approach would be to use pre-installed firmware by the device manufacturer which allows to download and install a specific device image. This would still need manual work (power and network connections), but would definitely be more automated. However, at least the Intel NUC (used in Smart Building as gateway hardware) does not come with such firmware.
Another possibility would be to perform a PXE boot with a minimal Linux system which then executes something like curl -L https://www.example.com/our-system-software.iso | dd of=/dev/sda. It is similar to the previous approach but does not require pre-installed firmware (except PXE/BIOS/UEFI, of course). This would actually be an advancement compared to manual copying, but would still require powering up every device once and also requires a special provisioning network (PXE, TFTP server, VLAN management, etc.). So there is still at least some degree of manual work needed.
Hardware vendor assistance
For this problem to be solved properly, i.e., to gain a maximum degree of automation, we would require control over the hardware manufacturing process itself or provisioning assistance from the hardware vendor.
With the help of the hardware manufacturer, preloading our final image becomes possible, or some other assistance from the hardware vendor may be offered, e.g., an automatable provisioning tool, or the aforementioned “factory mode“ (pre-installed firmware). Big vendors like Apple use those processes to produce millions of devices in order to get them straight into the hands of their customers.
Mender developer Northern.tech provided us with an example of such an integration, where one of their customers has set up Zero Touch Provisioning using NXP EdgeLock, Mender, and Azure Device Provisioning Service. Each device has its own mTLS key-pair embedded into the EdgeLock secure element for authentication with Mender. An initial image is contained on the device storage from the factory.
In conclusion, to solve this problem properly, cooperation with the hardware vendor is significantly important. Its service offerings can be used to provide them with a custom image, which will then be flashed onto the device during manufacturing.
However, for IoT projects where the device fleet is manageable, a simple PXE environment might be the best and most cost-effective option.
We have taken a look at the architecture of the Smart Building project at inovex, considering the general purpose as well as the client and server deployments and their rough interaction.
Knowing the overall structure, we then observed possible optimizations enabled by new features in the software stack, namely the removal of complexity in our custom provisioning software, and the replacement of the gateway’s SSH server for debugging with a smaller attack surface and less management overhead.
Lastly, we thought about how to optimize the process of installing the final system image onto target hardware and considered one real-world example used in production.