Intro: AI as a Very Well-Read Junior Engineer
The course is not “how to use AI” and not “how to write prompts”.
It is about using AI in DevOps / SysOps / SRE without increasing risk or blast radius.
The Mental Model
AI is the most well-read junior engineer you will ever work with:
- Knows tooling, flags, YAML, Terraform, Helm.
- Works fast and in parallel.
- Sounds confident.
And that is exactly why it is dangerous:
- It has low context.
- It has no production scar tissue.
- It does not feel risk.
- It does not care if it is prod or dev.
The Rule
AI does not decide. AI proposes. Humans own decisions and accountability.
The system must be designed so that even a brave junior (or a tired on-call engineer) cannot break production by default.
SafeOps Pledge
- AI is read-only advisor, not autonomous operator.
- No direct production writes without human approval and guardrail checks.
- One change per PR, one blast radius per rollout.
Decision Flow
AI suggestion
|
v
Guardrails check (policy/hooks/context)
|
+--> FAIL -> stop and fix
|
v
Human review + approval
|
v
Controlled apply (GitOps/runbook)
|
v
Evidence + verification
What You Will Build
By the end of this course you will have a production-grade platform running end to end:
Provision → Store & Version → Build & Gate → Deploy (GitOps)
(Terraform, Ch 2) (Git + SOPS, (CI/CD + hooks, (Flux, Ch 4)
Ch 3) Ch 5)
↓ deploys to ↓
┌─────────────────────────────────┐
│ Kubernetes Cluster │
│ │
│ Network Policies (Ch 6) │
│ Security Contexts (Ch 7) │
│ Resource Limits / QoS (Ch 8) │
│ HPA + PDB (Ch 9) │
│ Admission Policies (Ch 16) │
└──────────────┬───────────────────┘
│
Observability (Ch 10) ← Backup (Ch 11)
│
Chaos Engineering (Ch 12)
│
AI SRE Guardian (Ch 13) → 24/7 Ops (Ch 14)
- Provision — cloud infrastructure as code (Terraform on Hetzner)
- Store & Version — Git workflows, secrets management with SOPS
- Build & Gate — CI/CD pipelines with pre-merge guardrails
- Deploy — GitOps via Flux, continuous reconciliation
- Harden — network policies, security contexts, resource limits, admission policies
- Observe — metrics, logs, traces; backup and restore
- Break & Heal — chaos engineering to prove resilience
- Automate — AI SRE guardian for 24/7 operations
How Every Lesson Works
- What would AI propose (the brave junior)?
- What should we not allow?
- What guardrail stops it?
- What is the safe workflow?
Platform guardrails reference: see Chapter 01 — AI Changes Two Things at Once.