Guardrails-First Course Materials
This course teaches production-grade Kubernetes and SRE practice through incidents, guardrails, and repeatable workflows.
The goal is not to memorize tools. The goal is to learn how to keep systems safe when pressure, ambiguity, and AI-assisted speed all show up at the same time.
Who This Is For
- platform engineers moving from “it works” to “it survives mistakes”
- DevOps engineers who want stronger operating discipline, not more tooling hype
- SREs who want concrete labs, guardrails, and incident-shaped lessons
How the Course Works
Each chapter is built around one production failure pattern:
- what broke
- why the shortcut looked reasonable
- how the investigation becomes confusing
- which guardrail restores a safe operating path
Every core lesson includes:
- a written incident walkthrough
- a hands-on lab
- a quiz to confirm the operating rule
- runbooks or scorecards where the topic needs them
This course does not only teach how to operate Kubernetes around applications. It also shows what a production-ready Kubernetes application should look like so rollout safety, observability, GitOps reconciliation, and incident response work correctly in the first place.
The course uses the SafeOps reference applications as concrete examples:
- ldbl/backend, a small production-shaped Go API with health probes, metrics, tracing hooks, chaos endpoints, and OpenAPI/Swagger support
- ldbl/frontend, a Vue-based frontend with container hardening, runtime config injection, and Kubernetes deployment packaging
Many of the application patterns used throughout those reference apps are inspired by Podinfo by Stefan Prodan, including:
- readiness and liveness probes
- graceful shutdown on interrupt signals
- config and secret reload patterns
- Prometheus and OpenTelemetry instrumentation
- structured logging
- 12-factor configuration
- fault injection for safe drills
- packaging and install paths with Timoni, Helm, and Kustomize
- end-to-end validation with Kind and Helm
- multi-arch images, signing, SBOMs, provenance, and CVE scanning
Video assets are optional. The written lesson remains the primary source of truth, and the video should make the same lesson easier to absorb, not replace the material.
Recommended Learning Path
- Start with Intro: AI as a Very Well-Read Junior Engineer.
- Go through Chapters 01-14 in order.
- Run the lab before moving to the next chapter.
- Use the quiz to confirm the main guardrail rule before continuing.
- Move to the advanced modules only after the core path feels operationally natural.
Tracks
Core track:
- Chapters 01-14 covering platform foundations, GitOps, CI/CD, security, observability, reliability, and on-call discipline
Advanced track:
- Chapter 15: Supply Chain Security
- Chapter 16: Admission Policy Guardrails
- Chapter 17: Rollback and Data Migrations
- Module: Progressive Delivery (Canary with Traefik + Flagger)
Reference appendices:
- Appendix: Local Development Environment
- Appendix: DNS and TLS Automation
References
- Full structure and outcomes: Curriculum
- Intro mental model: Intro: AI as a Very Well-Read Junior Engineer