Appendix: DNS and TLS Automation
Incident Hook
A service works through a raw load balancer IP, but the real hostname fails during rollout. DNS still points at the wrong target, HTTPS is missing or expired, and the incident looks like an application bug. Time is wasted debugging pods while the actual failure sits at the edge. Production ingress needs automated DNS and automated certificate issuance together.
Why This Appendix Exists
The main course keeps early chapters focused on platform safety and GitOps. This appendix explains the edge automation layer used by the SafeOps platform:
external-dnsmanages DNS records from cluster statecert-managerissues certificates through Cloudflare DNS-01- Traefik ingresses reference the production issuer and TLS hosts
This is not a separate core chapter because ingress is not the center of the course. It is a supporting production capability you will rely on once the platform is running.
SafeOps Baseline
In the current SafeOps implementation:
- Traefik is the ingress controller.
external-dnsruns in thecert-managernamespace and syncs records for the target domain.cert-managermanagesClusterIssuerobjects for Let’s Encrypt staging and production.- ingresses request certificates by referencing the issuer and TLS hostnames.
- Cloudflare API token secret is the shared dependency for both DNS and certificate issuance.
Investigation Snapshots
Here is the DNS/TLS GitOps bundle used in the SafeOps system.
DNS and TLS GitOps bundle
Snippet unavailable during this build.
Here is the external-dns release used to synchronize DNS records.
external-dns release
Snippet unavailable during this build.
Here are the ClusterIssuer objects used for Let’s Encrypt staging and production.
ClusterIssuer configuration
Snippet unavailable during this build.
Here is the frontend ingress pattern that requests TLS from cert-manager.
Frontend ingress with TLS
Snippet unavailable during this build.
Safe Workflow (Step-by-Step)
- Confirm the Cloudflare token secret exists in the
cert-managernamespace before enabling either DNS sync or certificate issuance. - Reconcile the
dns-and-certificatesbundle soexternal-dnsand the issuers exist before you depend on them. - Verify
cert-managerandexternal-dnspods are healthy. - Confirm
ClusterIssuerreadiness for both staging and production issuers. - Add or verify ingress hostnames, TLS blocks, and issuer annotations in the application ingress.
- Wait for DNS record creation and certificate issuance before declaring the route healthy.
- Validate with real hostname and HTTPS, not only with raw service or load balancer IP checks.
Verification Commands
kubectl -n cert-manager get pods
kubectl get clusterissuer
kubectl -n cert-manager logs deploy/external-dns --since=10m
kubectl -n develop describe ingress frontend
kubectl -n develop get certificate,secret
Common Failure Patterns
- Cloudflare token secret missing or wrong, so DNS records and ACME challenges fail.
- Ingress host exists, but TLS block or issuer annotation is missing.
- DNS points correctly, but certificate is still pending because the ACME challenge never completed.
- Teams test only with raw IPs and miss that the real hostname path is still broken.
Guardrail Principle
Automate DNS and TLS together. Manual DNS records plus manual certificate handling create hidden outage debt.
Done When
external-dnsis reconciling without errors- staging and production
ClusterIssuerobjects are ready - ingress resources request TLS explicitly
- hostname resolution and HTTPS both succeed for the intended route
- you can explain whether a failure belongs to app routing, DNS sync, or certificate issuance