In this post we will show you how we manage our HashiCorp Vault in a complex environment with many secrets, policies and Private Key Infrastructures (PKIs) using Terraform. Come join us on the journey from barely maintainable bash scripts to terraform and learn from our experience.
If you are mainly interested in code you can directly jump to the demo on GitHub.
In a recent project we used HashiCorp Vault as the central secret store for our cluster. This includes:
- Creating PKIs for our bare-metal Kubernetes clusters.
- Storing secrets such as wildcard certificates and keys for our ingress.
- Storing credentials required for interacting with various systems in our deployment.
- Creating the roles, policies and app roles that allow our systems to interact with HashiCorp Vault.
The initial way for filling our Vault with these was a set of bash scripts that also read various input files for the many kinds of secrets and policies we required. The bash scripts and the config files were both plenty and contained loads of duplicated code.
Those bash scripts only created the Vault mounts and policies, though, actual credentials had to be added manually from our gopass team password store after running the scripts. With a growing number of clusters and managed credentials this quickly became incomprehensible, unmaintainable and often left remains when renaming secrets, as our tooling didn’t really handle that.
We quickly came up with the solution to use Terraform based on three assumptions:
- Terraform code should be more readable than the bash scripts with the various input files we had.
- Terraform’s state handling and declarative nature should eliminate the problem of duplicated, unused secrets after refactoring.
- Both tools are from HashiCorp so we hoped that they’ll play well together.
We validated these assumptions with a quick and dirty Proof of Concept (PoC) and found that the Terraform code is definitely more readable than our previous construct and the provider works well, handling said renamings gracefully.
We also tried Ansible in another PoC, as it was already used in the installation of our vault instances. While it would have been an improvement over our old setup we found Ansible to be less of a natural fit, as the procedural nature of the hashivault module forced us to encapsulate the logic of renaming and removing configs ourselves.
Now that we were happy with our concept of terraforming Vault it was time to implement our whole setup this way. We created a module each for cluster-specific values, which mainly consists of the PKIs and the associated roles and policies, one for the tenant-specific values which were mostly certificates in our case and added global app roles and policies in our main portion of the code.
This means that adding a new cluster in our setup only requires adding the according values to the Terraform variables. For the tenant-specific values we also used Terraform Provider Pass which allowed us to copy the certificates and keys that already exist in our password store to our Vault in the same process. As with the cluster portion, we also only need to add the name of the team to our Terraform variables and everything required is created by our terraform code.
A minimal version of this concept is implemented in our demo. The demo showcases the PKI part of our implementation in a reduced way. It mainly consists of a Terraform module for creating said PKIs, one for each Certificate Authority (CA) as listed in the kubernetes certificate best practices. For each PKI the CA is created alongside a role for Kubernetes master nodes that enables them to issue the certificates they require. This role is then bound to one Vault app role using a policy. The AppRole can be used to log in to Vault and generate a certificate. The demo shows this process for a single certificate. To actually use this module there would also need to be a role for worker nodes and tooling for the nodes to issue these certificates automatically.
One final thing that has to be solved is the storage of the Terraform state. As stated in the Terraform Vault provider documentation, the tfstate files created by terraform apply contain secrets that are written to or read from Vault. Ideally we would want to automatically apply our latest configs from within a pipeline, but to not leak our secrets this means we need to handle state in one of the following ways:
- Throw away the state: This would work when only bootstrapping policies and secrets from our gopass, but would cause our Kubernetes PKIs to create a new CA on every Terraform run. It would also mean that re-namings would, again, cause the old secrets to be left in place.
- Keep the state on the runner: When using your own CI runners you can create one only for running this Terraform apply job. This runner would store the state file locally and could be put under special precaution to prevent abuse. This means that losing the runner would mean losing the state.
- Store the state externally: By using a remote backend such as S3, you can store the Terraform state outside the runner. You then only have to secure your remote storage. This has the additional advantage that you can use the same backend configuration to also run Terraform locally.
We definitely recommend option 3 where possible, but if the other two work for you they should also be fine.
We found terraform to be a good tool for our needs, it helped us improve the structure of our Vault layout code immensely. What experience do you have with filling your Vault with life? What tools did you use? We are interested in your experience and invite you to try our method using terraform.