The introduction of cloud computing has changed a lot about how we develop and operate software. The API-driven nature of the cloud has enabled developer teams to deploy and operate applications themselves which supports the DevOps movement. Architecture patterns like microservices try to solve the socio-technical problems of modern software development. In this blog post, I want to describe some best practices for designing the software architecture of a cloud-native application for a small team with little Ops know-how. The best practices focus mainly on classic B2B applications or company-internal backends without any hyper-growth expectations in the next five years. For every practice given in this article, I will lay out which are the tradeoffs of that decision.
To give valid best practices, I will focus on specific preconditions, as there are no silver bullets in our IT industry. The team developing the applications has five to ten team members, commonly known as the two-pizza-sized agile team. The expected load our application will handle is in the range of one event per minute up to around a thousand events per second. An event could either be an incoming HTTP request or an event ingested via some sort of queue. These numbers may increase in the next five years by a factor of five to ten but there is no hypergrowth expected. These are typical numbers for some internal business applications, or your target market is some well-known business-to-business market with a limited number of potential customers. Your team has little operational capacity since you want to focus on features instead of building and maintaining infrastructure. Furthermore, many studies have shown that while developers enjoy the freedom given by the DevOps movement, many do not want to be the jack of all trades required to operate complex software systems.
As the last precondition, we want to keep the system as vendor-neutral as possible. This could either be the case because our organization is not yet committed to a cloud, runs a multi-cloud strategy, or wants to be able to change from a hyperscaler to a European provider due to recent GDPR rulings in Germany, for example.
Serverless and Microservices
In the last years, microservices have been promoted as a solution to build scalable and loosely coupled systems. Many older software systems are built in a monolithic architecture, accumulating technical debt by coupling subsystems tightly together. Microservices, on the other hand, allow us to develop subsystems independently of each other, deploy them independently and scale them independently. This allows to cut the greater organization into smaller teams, each responsible for a handful of independent microservices. To keep the cost per service low, and solve cross-cutting concerns, container orchestrator solutions like Kubernetes have emerged. They help to deploy multiple instances of a service easily, have native tooling for autoscaling, and a basic DNS-based service discovery. Kubernetes itself is a complex system with many moving parts, thus most cloud providers, not only the hyperscalers, have a Kubernetes-as-a-Service offering. However, Kubernetes by itself is only a building block for an internal application platform required to efficiently operate microservices. To solve problems like circuit breaking, transport encryption, and service-to-service authentication, often a service Mesh like istio or linkerd is deployed on top of Kubernetes, adding additional complexity to the stack. Since microservices build a distributed system, each microservice must be able to deal with problems introduced by network failures, and since every communication between subsystems happens via the network, the likelihood of a network failure happening during an operation increases. Classic approaches like starting a database transaction at the beginning of a business transaction which is either committed or rolled back at the end is often not possible, because a business transaction may span multiple services. By the book, every microservice should have its own independent data store. Thus, the complete system must be programmed as an eventually consistent system. This programming paradigm also influences product design and requires a more experienced team.
Since a set of microservices build a distributed system, solutions like distributed tracing are required to observe the actual system behavior. While helpful, they need some investment in both the code base and the infrastructure to be helpful.
To roll out microservices independently and not build a distributed monolith, every API change must be compatible with existing client systems. Thus, making a breaking change to a microservice requires you to orchestrate the changes in multiple services. This is more time-consuming than changing the codebase within a more monolithic system. Moreover, your codebase must be able to support old and new clients for a certain period of time. Especially, the cleanup of old code paths is often neglected since they consume time without adding new features. This technical debt, however, increases the complexity of the system in the long run.
Last but not least, it is often hard to spawn a complete local instance of a microservice architecture on a developer’s machine. Here, often strategies like feature deployments are used, where a new instance of a service is deployed to a pre-production system for every pull request. However, it is often very hard to reproduce the exact same configuration and version of all services running in production to reproduce a problem. And end-to-end test results on pre-production systems are often hard to extrapolate on production since there is a completely different set of versions deployed on pre-production than production. Thus, to safely operate a microservice architecture, you need to have testing-in-production capabilities, where you can deploy a canaray of your microservice into production, and test it with a subset of the workload, rolling back in case of failures.
All the above-mentioned costs of microservices are often solved by additional infrastructure and code paths in your application. This pays off if you have a larger organization with many independent teams but, as described earlier, we have only a small team at hand without a supporting platform team building those cross-cutting capabilities.
Serverless to the Rescue!
A proposed solution for this problem is often Serverless or Function-as-a-Service architectures (FaaS). Serverless is a very broad and vague term, so we will focus on the function-as-a-service style AWS Lambda interpretation of Serverless in the next paragraphs. Serverless services reduce the operational burden so developers must only push their code into the FaaS service, and the FaaS service will handle updates of the runtime, scheduling, and auto-scaling of your instances. FaaS often even supports scaling to zero instances to save money. However, scaling to zero instances leads to latency spikes when a request comes in and the first instances must be launched. Thus, your function must be small and have short startup times to mitigate latency spikes. Hence, an application will be built from many functions often call each other. Thus, you have to solve similar cross-cutting concerns in your FaaS architecture, as in a microservice architecture. You need to have service-to-service authentication, transport encryption, and circuit breaking, and your system must be built around the fact that a network failure will happen with a rather high probability. Moreover, you must still roll out API changes between functions in a non-breaking way to not produce a distributed monolith, often called deployment monolith because you must deploy every function in lockstep. Which itself defeats the benefit of having independent software units. Last but not least, it’s often impossible to run a local replica of all your functions interacting with each other.
Start with a Modulith
The best practice advice from this is to start with an architecture as monolithic as possible. You can apply modern practices from , e.g., domain driven development to structure your monolithic application well. The architectural pattern is often known as modulith. You design the components of your software as independent, loosely coupled subsystems, like a microservice, but still bundle them together within the same application, instead of coupling them via the network. A well-designed modulith application requires much less supporting infrastructure. You do not need to have a container orchestrator to deploy a set of micro services. Instead, a well-crafted docker-compose file deployed on a handful of virtual machines will give you a reliable and flexible deployment environment, you still can understand. You can run your modulithic applications, with container-as-a-service platforms like AWS Fargate, Google Cloud Run, or Azure Container Services. You do not have to build infrastructure for service-to-service authentication, you must not build your system eventually consistently. What you test in pre-production is the same code, you will deploy on production, thus you must not invest in canary deployment infrastructure. You still can, but you are not forced to do so.
In the last section, we focussed on the global architecture of our service, in this section we want to focus on the technical components of our application. As stated in our scenario description, we want to keep our application as vendor-neutral as possible. A hexagonal architecture with ports and adapters for external systems supports that goal. In short, a hexagonal architecture uses ports and adapters to abstract your core application logic from external systems like the used datastore and the API your service exposes. As datastore, you should prefer widely-used SQL datastores like MySQL or PostgreSQL, since every major cloud provider offers those as a service, and also smaller cloud providers like IONOS offer PostgreSQL as a service which reduces the operational overhead for your small team, while still maintaining vendor neutrality. As a third bonus, docker containers to start a local MySQL or PostgreSQL instance are widely available and enable developers to use the same stack locally as in production. If you use an SQL database, enable your application to use different database connections for reading and writing queries. This allows you to scale the read path of your application horizontally by adding read replicas. While S3 is technically an AWS-specific protocol, most other object store systems offer a S3-compatible API, thus it is a sustainable choice for an object store protocol. As mentioned earlier, you should build your application with a hexagonal architecture, using ports to abstract the actual implementation. If adding a new port, try to leak as few implementation details as possible. Thus you could also use vendor-specific services like AWS DynamoDB. Switching vendors then requires you to just write a new adapter for that port formerly using AWS DynamoDB. In practice, often the port protocol encodes some behavior of AWS DynamoDB, which is very hard to replicate with other storage systems. The initial pick of your storage system should be a very deliberate choice.
Your application itself must be stateless to enable horizontal scaling, so every state should be stored in a dedicated datastore as discussed above. It must be irrelevant which instance of your software a user accesses with a request. This allows you to scale even a monolithic application horizontally. You may add caches and use some sort of user stickiness to increase cache hit rates, but it must not be a hard requirement that two consecutive events or HTTP requests are processed by the same instance.
Making your software health visible
OpenMetrics is a standard describing the metrics format introduced by the Prometheus monitoring system. This format is understood by a wide range of OpenSource and Closed Source monitoring systems like VictoriaMetrics, DataDog, and AWS Cloudwatch. Thus allowing you to change the monitoring system dependent on your needs without changing the metrics generation in your code. Generate both technical metrics like HTTP status codes, number of errors, and histograms of latencies, but also business metrics like the number of successful logins or number of business actions executed and their results.
OpenTelemetry allows you to get even more insights into your application by generating traces and generating metrics. The metrics can be exposed in an OpenMetrics-compatible format. While OpenTelemetry offers more observability insights, it is a rather young eco-system with a vast number of implementations, available as both as-a-service and self-hosted open source versions
The deployment artifact should be an Open Container Initiative (OCI) container image. OCI is a standard evolved from the proprietary docker container images. As of today, there are multiple runtimes to execute OCI containers including docker. There are many cloud provider services like Google Cloud Run, AWS Fargate, or Azure Container Services able to run a given OCI image. If you want to build the runtime infrastructure yourself, you have many options like any Kubernetes Distribution, HashiCorp Nomad, or just a bunch of VMs with docker-compose as valid execution environments. If you follow the spirit of the 12-factor app, you will be able to migrate your service with minimal effort from AWS Fargate to an on-premise Kubernetes and back.
Often you have synchronous tasks and asynchronous background tasks in an application, e.g., sending some notifications, or other things run periodically like compute reports. You should run those background tasks on dedicated instances. Splitting frontend instances and background workers help you to keep the latency and the latency jitter of synchronous requests low because a load of an instance is only affected by a single source of load. Moreover, it helps you to scale your application instances automatically based on a single signal. Either the number of incoming HTTP requests or the number of tasks in the task queue.
For ease of development, you should use the same code base for the frontend and background worker instances. You should even use the same OCI image for both and use environment variables to configure the instances as background workers or frontend instances. This eases the development, and you can deploy both instance types at the same time, making it easy to execute schema changes. Your background worker should fetch jobs from some sort of queue. This allows you to scale them horizontally. If you need to run some jobs based on some sort of timer, create a single instance publishing jobs to the queue, so you do not need any leader election in the background worker instances.
Finally, you should use whatever programming language your team feels comfortable with and allow modern software development. The main requirement for modern software development is good support for unit tests, so you can test your codebase extensively and automatically. The language should have good support to run in Linux OCI containers. You should set up automatic CI pipelines testing every code change before merging to the main branch, or if you do trunk-based development, the pipelines should give you fast feedback if the last commit broke the trunk. Your deployments should be done automatically from a source of truth. Even a semi-automated deployment process where a human has to trigger a pipeline with a single click will help you to reduce errors during deployment a lot. A good strategy I saw in the past was automatic deployments to pre-prod environments followed by a manually triggered but automated deployment to production. The automatic deployments on pre-prod help you to build trust in the deployment process while keeping the upfront investment in the CI pipeline low. You should however aim in the long run for fully automated release pipelines where a change to the main branch is rolled out to production automatically after the test suits have passed. A good strategy to support that is to use feature flags to separate code rollouts from feature activation. With a feature flag, you can roll out the code to production but activate the new feature or behavior at another point in time. Most feature flag toolkits support experiments where you can test a new feature with a subset of users. The tooling you use here should be chosen by the knowledge of the team and your local talent pool. Using some boring languages like Python, C#, Kotlin, Golang or Java may help you to hire more easily, than using the newsted hyped functional language very few people have heard of.
To wrap this article up, I outlined how to architect your modern cloud-native application if you have only the resources of a small team at hand. Using a modulithic overall architecture combined with a hexagonal architecture within the components of your modulith, enables you to react flexibly to changing requirements. The architectural guidelines given in this article should allow you to leverage the benefits of modern cloud-native software, while at the same time keeping infrastructure costs for small and midscale applications low.