Panfactum LogoPanfactum
Bootstrapping StackCertificate Management

Certificate Management

Objective

Deploy the foundational components for managing X.509 certificate infrastructure.

Background

In this and the following sections we will be deploying various systems for managing your network's cryptography. For a detailed discussion of the fundamental concepts and motivations, we strongly recommend reviewing our concept docs on network cryptography for the relevant background information.

Deploy cert-manger

cert-manager is the de-facto standard tool for managing X.509 certificates in Kubernetes. It handles the entire certificate lifecycle: provisioning, deployment, and renewal. Additionally, it works with both public certificate authorities (e.g., Let's Encrypt) and private CAs (e.g., Vault).

We provide a module for deploying cert-manager: kube_cert_manager.

Let's deploy it now:

  1. Create a new directory adjacent to your kube_vault module called kube_cert_manager.

  2. Add a terragrunt.hcl to that directory that looks like this.

  3. Set vpa_enabled to false. We will enable it when we deploy the autoscalers.

  4. Set self_generated_certs_enabled to true. We will disable insecure self-signed certificates after we set up certificate issuers in the next section.

  5. Add a module.yaml that enables the aws, kubernetes, helm, and random providers.

  6. Run terragrunt apply.

You should now see the cert-manager pods running:

cert-manager pods are running

As you might notice, cert-manager comes with three deployments:

  • controller (unlabeled): The primary system for managing the certificate lifecycle

  • cainjector: Injects CA data into webhooks to facilitate mTLS

  • webhook: Validates and mutates cert-manager CRDs submitted to the Kubernetes API server

Additionally, notice that several CustomResourceDefinitions (CRDs) were installed in the cluster:

cert-manager custom resource definitions

These CRDs extend the Kubernetes API to allow us to create new types of resources such Certificates which cert-manager will handle.

Deploy Issuers

Creating certificates using cert-manager is a two-step process:

  1. Create an "issuer" which instructs cert-manager on how to create a certificate.

  2. Create the certificate by indicating which issuer to use.

You can read more about this process in the cert-manager documentation here.

We provide a module to set up the issuers required by the Panfactum stack: kube_cert_issuers.

This module creates three different types of issuers:

  • An issuer for public certificates signed by Let's Encrypt used by your public services

  • An issuer for internal certificates signed by your Vault cluster 1 for use in securing internal network traffic such as which webhooks

  • An issuer for internal intermediate CA certificates signed by your Vault cluster by tools like the service mesh (next section)

Let's deploy it now:

  1. Create a new directory adjacent to your kube_cert_manager module called kube_cert_issuers.

  2. Add a terragrunt.hcl to that directory that looks like this.

  3. Note the alert_email will receive notifications if your certificates fail to renew. Set it to an email that is actively monitored to prevent unexpected service disruptions.

  4. Let's walk through the route53_zones input:

    1. When creating certificates for use in creating TLS connections to particular domains, cert-manager must have the ability to create DNS records as a part of Let's Encrypt's DNS challenge flow. That means cert-manager needs access to our AWS Route53 zones that host our DNS records for the cluster's environment.

    2. In the prior DNS guide, we configured DNS in the aws_registered_domains and aws_delegated_zones modules under production/global. If you are setting up an environment other than production, we cannot reference them in terragrunt dependency blocks. Instead, we must copy the necessary values manually from the module outputs. 2

    3. To retrieve the outputs, change your directory to the desired modules and run terragrunt output. You will see an output that looks like this:

      Output from the aws_delegated_zones module
    4. The record_manager_role_arn defines the IAM role that can create records in the every zone created by the module. The zone_id is the unique internal AWS identifier for the Route53 zone hosting records for the domain.

  5. Add a module.yaml that enables the aws, vault, and kubernetes providers.

  6. Run terragrunt apply.

Deploy the First Certificate

Let's ensure that the certificate infrastructure is working as expected.

As context, many projects come with a mechanism to generate their own certificates for ease-of-use. However, in a production setting these self-generated certificates are not ideal:

  • They may contain insecure ciphers
  • You cannot control their rotation frequency
  • They are signed by private keys that can be easily extracted from the cluster
  • They are not visible in our centralized observability tooling

In the Panfactum stack, we ensure that all internal utilities components use the centrally managed Vault certificates.

Let's replace the self-generated cert-manager webhook certificates 3 with ones generated by Vault:

  1. Return to the kube_cert_manager module.

  2. Update the self_generated_certs_enabled to false. This will instruct the module to instead use the internal certificate issuer created by kube_cert_issuers.

  3. Run terragrunt apply.

  4. After a few moments, the module should successfully update. Using k9s, you should now be able to find your first Certificate resource (:certificate):

    Your first certificate
  5. A certificate is an abstract resource that ultimately results in certificate data stored in a corresponding secret:

    The instantiated secret from the certificate

    Notice that there are three pieces of data in the secret:

    • tls.key: The private key for the certificate
    • tls.crt: The actual public X.509 certificate
    • ca.crt: The public certificate of the certificate authority that signed the tls.crt
  6. Note that you can decode secrets in k9s by pressing x:

    The raw certificate data

    You can copy the secret data by pressing c.

  7. Copy the tls.crt certificate to a decoder to view the metadata: 4

    The certificate metadata
  8. cert-manager has a companion CLI called cmctl to aid in managing certificates. The Panfactum devenv already bundles it. Let's manually rotate the certificate a few times to ensure the certificate rotation infrastructure is working as intended: cmctl -n cert-manager renew cert-manager-webhook-certs. 5

Congratulations! We have just verified that the internal certificate provisioning process works as intended. We will test public certificates when we deploy our ingress system in a future guide section.

Deploy trust-manager

There is one final component to the certificate infrastructure: trust-manager.

trust-manager is an ancillary cert-manger utility that copies CA trust bundles into every cluster namespace so that they can be consumed by cluster workloads. As our private CA's (Vault) certs are not installed in every operating system by default, this provides a convenient way to make them available.

We provide a module for deploying trust-manager: kube_trust_manager.

Let's deploy it now:

  1. Create a new directory adjacent to your kube_cert_manager module called kube_trust_manager.

  2. Add a terragrunt.hcl to that directory that looks like this.

  3. Set vpa_enabled to false. We will enable it when we deploy the autoscalers.

  4. Add a module.yaml that enables the aws, kubernetes, and helm providers.

  5. Run terragrunt apply.

Note that unlike most modules which receive their own namespace, trust-manager will be deployed to the cert-manager namespace. 6

Next Steps

Now that internal certificate management is working, we will build upon that foundation to deploy a service mesh which will ensure that all network traffic in the cluster is secured with mTLS.

PreviousNext
Panfactum Bootstrapping Guide:
Step 12 /20

Footnotes

  1. This module also configure Vault to act as a private CA

  2. This is because cross-environment dependencies will break the CI permission model where each CI runner only gets access to a single environment at a time.

  3. Many projects that extend Kubernetes include webhook servers that register themselves with the Kubernetes API server to validate or mutate incoming Kubernetes manifests. Read more about how that works here. These webhooks require mTLS to work.

  4. Recall that the certificate is public information, so it is safe to share.

  5. Certificates will automatically rotate as they near expiration so this is just for testing purposes.

  6. Requirement of the trust-manager