Intro: AI as a Very Well-Read Junior Engineer

The course is not “how to use AI” and not “how to write prompts”.

It is about using AI in DevOps / SysOps / SRE without increasing risk or blast radius.

The Mental Model

AI is the most well-read junior engineer you will ever work with:

Knows tooling, flags, YAML, Terraform, Helm.
Works fast and in parallel.
Sounds confident.

And that is exactly why it is dangerous:

It has low context.
It has no production scar tissue.
It does not feel risk.
It does not care if it is prod or dev.

The Rule

AI does not decide. AI proposes. Humans own decisions and accountability.

The system must be designed so that even a brave junior (or a tired on-call engineer) cannot break production by default.

SafeOps Pledge

AI is read-only advisor, not autonomous operator.
No direct production writes without human approval and guardrail checks.
One change per PR, one blast radius per rollout.

Decision Flow

AI suggestion
   |
   v
Guardrails check (policy/hooks/context)
   |
   +--> FAIL -> stop and fix
   |
   v
Human review + approval
   |
   v
Controlled apply (GitOps/runbook)
   |
   v
Evidence + verification

What You Will Build

By the end of this course you will have a production-grade platform running end to end:

Provision          →  Store & Version  →  Build & Gate    →  Deploy (GitOps)
(Terraform, Ch 2)     (Git + SOPS,        (CI/CD + hooks,    (Flux, Ch 4)
                       Ch 3)               Ch 5)

                              ↓ deploys to ↓

                    ┌─────────────────────────────────┐
                    │      Kubernetes Cluster          │
                    │                                  │
                    │  Network Policies (Ch 6)         │
                    │  Security Contexts (Ch 7)        │
                    │  Resource Limits / QoS (Ch 8)    │
                    │  HPA + PDB (Ch 9)                │
                    │  Admission Policies (Ch 16)      │
                    └──────────────┬───────────────────┘
                                   │
                    Observability (Ch 10) ← Backup (Ch 11)
                                   │
                    Chaos Engineering (Ch 12)
                                   │
                    AI SRE Guardian (Ch 13) → 24/7 Ops (Ch 14)

Provision — cloud infrastructure as code (Terraform on Hetzner)
Store & Version — Git workflows, secrets management with SOPS
Build & Gate — CI/CD pipelines with pre-merge guardrails
Deploy — GitOps via Flux, continuous reconciliation
Harden — network policies, security contexts, resource limits, admission policies
Observe — metrics, logs, traces; backup and restore
Break & Heal — chaos engineering to prove resilience
Automate — AI SRE guardian for 24/7 operations

How Every Lesson Works

What would AI propose (the brave junior)?
What should we not allow?
What guardrail stops it?
What is the safe workflow?

Platform guardrails reference: see Chapter 01 — AI Changes Two Things at Once.