automating tls certificates in hybrid environments

As the computing landscape continues to evolve, the need to secure an ever-growing and changing set of endpoints is becoming more and more important. In the past, the focus was on securing the perimeter of the network, but with the advent of cloud computing, mobile devices, and the Internet of Things (IoT), the perimeter has become more and more difficult to define. In addition, the perimeter is no longer a single point of entry, but rather a collection of entry points that are constantly changing. This makes it difficult to secure the perimeter, and it also makes it difficult to secure the endpoints.

With the growth of Kubernetes and more recently Service Mesh, the "edge" can in fact be anywhere, and your clients can similarly be anywhere. This means that the traditional approach of using a single VPN or proxy ingress point is no longer sufficient, and we instead need to be able to secure the communication between all of the components of our application, regardless of where they are running. This is a core principle of Zero Trust Networking, and it is also a core principle of the Istio Service Mesh. Whereas before anyone on your network was trusted - including the summer intern who just started last week - now no one is trusted, and all communication must be secured and authenticated.

cert-manager provides a cloud agnostic approach to automating the management and issuance of TLS certificates, and has many integrations with public cloud providers to validate the identity of the certificate requestor. However, in a hybrid environment, where you have a mix of on-premises and cloud-based workloads, this becomes more difficult. In this post, we will look at how to use cert-manager to automate the issuance of TLS certificates in a hybrid environment, and how to use the localdns ACME DNS01 Challenge Provider.

If you are familiar with cert-manager, you may already know of the "native" ACMEDNS solver, and assume the problem has been solved. However if you are really familiar with cert-manager, you will know that the ACMEDNS solver is quite complex to configure and manage, and comes with a few known limitations that make it less-than-desireable to use in a production environment.

I'd like to think of the acme-localdns solver as "ACMEDNS v2" - it's certainly not intended to be a replacement for the ACMEDNS solver, but it does solve (no pun intended) many of the problems that the ACMEDNS solver has, and it is much easier to set up and "just run", at least in my experience.

Why? If you are using cloud-hosted DNS, you can use one of the many existing DNS-01 solvers to automatically solve DNS challenges. If you are hosting your own PKI / ACME instance, you can use the EAB (External Account Binding) feature to solve DNS challenges. However, if you are hosting your own DNS but not your own PKI / ACME instance, you need a way to solve dynamic DNS challenges for your certificates without updating your existing DNS tooling. This is where the acme-localdns solver comes in.

How it works

As a very quick recap, challenge solvers are necessary in a PKI footprint to prove that you own the domain you are requesting a certificate for. The DNS-01 challenge type is one of the most common challenge types, and it works by creating a TXT record in your DNS zone with a specific value defined by the ACME server. The ACME server then checks to see if the TXT record exists, and if it does, the challenge is considered solved and the certificate is issued.

This prevents me from requesting a certificate for google.com without actually owning the domain, because I would not be able to create the TXT record in the google.com DNS zone. However, if I do own the domain, I can create the TXT record and the challenge will be solved.

With cloud DNS providers such as AWS Route53, Azure DNS, and Google Cloud DNS, this is easy to do because the DNS provider has an API that allows you to create and delete DNS records. However, if you are hosting your own DNS, you need to be able to create and delete DNS records in your DNS zone.

Depending on your local DNS server, this can range from somewhat difficult, to entirely impossible to automate.

How it works (with acme-localdns)

acme-localdns operates its own dynamic DNS server to solve DNS-01 challenges, so all you need to do is CNAME a static _acme-challenge.example.com record to the acme-localdns server, and it will automatically create and delete dynamic TXT records as required by the ACME server.

Whereas ACMEDNS requires an HTTP server (and the corresponding security/RBAC), JSON auth keys, and manual pre-configuration of the issuer every time you want to issue a new cert, acme-localdns requires none of these things. It requires a one-time configuration when deployed, and then it "just works" entirely through the native cert-manager Certificate interface, just like any other DNS-01 challenge provider. All you need to do is manually create your _acme-challenge records in your DNS zone pointing to the acme-localdns server, and it will take care of the rest.

How to use it

The acme-localdns solver is available as a Helm chart. If you planning on sticking with the defaults, you really only need to focus on the first few lines of the values.yaml file:

# The GroupName here is used to identify your company or business unit that
# created this webhook.
# For example, this may be "acme.mycompany.com".
# This name will need to be referenced in each Issuer's `webhook` stanza to
# inform cert-manager of where to send ChallengePayload resources in order to
# solve the DNS01 challenge.
# This group name should be **unique**, hence using your own company's domain
# here is recommended.
groupName: acme.example.com
# The nameserver is the authoritative nameserver that will be returned
# in queries for the domain name. This is usually the same as the domain
nameserver: "acme.example.com."
# The domain name is the domain name / zone that will be managed by this server.
domainName: "acme.example.com."
# The rname is the email address that will be used in the SOA record, where the
# @ symbol is replaced with a dot.
rname: "admin.acme.example.com."
# The publicIP is the IP address (or CNAME) that will point to this server.
# If you don't yet have this, such as in instances where your load balancer IP
# will be dynamically assigned, you can leave this blank and update it later.
publicIP: "1.2.3.4"

Once you have updated the values.yaml file, you can deploy the chart:

helm upgrade -i -n cert-manager \
	cert-manager-webhook-acme-localdns \
	./deploy/localdns \
	-f /path/to/values.yaml

Now that we have the solver deployed, we need to create a CNAME record in our DNS zone pointing to the acme-localdns server. This is the only manual step required to use the acme-localdns solver.

Imagine we have our acme-localdns server running at "acme.example.com", and we want to issue a certificate for "mycoolnewapp.com", where the DNS for "mycoolnewapp.com" is hosted on our local DNS server. We would create a CNAME record in our local DNS zone for "_acme-challenge.mycoolnewapp.com" pointing to "_acme-challenge.mycoolnewapp.com.acme.example.com".

Now, when we create a Certificate resource in cert-manager, it will automatically create a DNS-01 challenge for "_acme-challenge.mycoolnewapp.com", and the acme-localdns server will automatically create the TXT record in the "_acme-challenge.mycoolnewapp.com" zone. Once the challenge is solved, the TXT record will be deleted, and the certificate will be issued.

That's pretty much it - now you can issue certificates for any domain you own, without having to manually create and delete TXT records in your DNS zone, and without having to rely on a cloud provider to do it for you.

last updated 2023-08-12