Appendix: DNS and TLS Automation

Incident Hook

A service works through a raw load balancer IP, but the real hostname fails during rollout. DNS still points at the wrong target, HTTPS is missing or expired, and the incident looks like an application bug. Time is wasted debugging pods while the actual failure sits at the edge. Production ingress needs automated DNS and automated certificate issuance together.

Why This Appendix Exists

The main course keeps early chapters focused on platform safety and GitOps. This appendix explains the edge automation layer used by the SafeOps platform:

  • external-dns manages DNS records from cluster state
  • cert-manager issues certificates through Cloudflare DNS-01
  • Traefik ingresses reference the production issuer and TLS hosts

This is not a separate core chapter because ingress is not the center of the course. It is a supporting production capability you will rely on once the platform is running.

SafeOps Baseline

In the current SafeOps implementation:

  • Traefik is the ingress controller.
  • external-dns runs in the cert-manager namespace and syncs records for the target domain.
  • cert-manager manages ClusterIssuer objects for Let’s Encrypt staging and production.
  • ingresses request certificates by referencing the issuer and TLS hostnames.
  • Cloudflare API token secret is the shared dependency for both DNS and certificate issuance.

Investigation Snapshots

Here is the DNS/TLS GitOps bundle used in the SafeOps system.

DNS and TLS GitOps bundle

Snippet unavailable during this build.

Here is the external-dns release used to synchronize DNS records.

external-dns release

Snippet unavailable during this build.

Here are the ClusterIssuer objects used for Let’s Encrypt staging and production.

ClusterIssuer configuration

Snippet unavailable during this build.

Here is the frontend ingress pattern that requests TLS from cert-manager.

Frontend ingress with TLS

Snippet unavailable during this build.

Safe Workflow (Step-by-Step)

  1. Confirm the Cloudflare token secret exists in the cert-manager namespace before enabling either DNS sync or certificate issuance.
  2. Reconcile the dns-and-certificates bundle so external-dns and the issuers exist before you depend on them.
  3. Verify cert-manager and external-dns pods are healthy.
  4. Confirm ClusterIssuer readiness for both staging and production issuers.
  5. Add or verify ingress hostnames, TLS blocks, and issuer annotations in the application ingress.
  6. Wait for DNS record creation and certificate issuance before declaring the route healthy.
  7. Validate with real hostname and HTTPS, not only with raw service or load balancer IP checks.

Verification Commands

kubectl -n cert-manager get pods
kubectl get clusterissuer
kubectl -n cert-manager logs deploy/external-dns --since=10m
kubectl -n develop describe ingress frontend
kubectl -n develop get certificate,secret

Common Failure Patterns

  • Cloudflare token secret missing or wrong, so DNS records and ACME challenges fail.
  • Ingress host exists, but TLS block or issuer annotation is missing.
  • DNS points correctly, but certificate is still pending because the ACME challenge never completed.
  • Teams test only with raw IPs and miss that the real hostname path is still broken.

Guardrail Principle

Automate DNS and TLS together. Manual DNS records plus manual certificate handling create hidden outage debt.

Done When

  • external-dns is reconciling without errors
  • staging and production ClusterIssuer objects are ready
  • ingress resources request TLS explicitly
  • hostname resolution and HTTPS both succeed for the intended route
  • you can explain whether a failure belongs to app routing, DNS sync, or certificate issuance