Checkpoint B: Your Runtime Safety Net

You have completed Chapters 01-09. Before moving into observability and operations, pause and look at the runtime constraints you have built.

This page is a consolidation — not new material. Its purpose is to show how the four hardening chapters (06-09) work together to contain failure.

What You Have Built

Your workloads now run inside a layered safety net. Each layer limits blast radius in a different direction:

                ┌─────────────────────────────────────┐
                │   Chapter 06: Network Policies      │
                │   Who can talk to whom              │
                ├─────────────────────────────────────┤
                │   Chapter 07: Security Context      │
                │   How pods run (non-root, read-only)│
                ├─────────────────────────────────────┤
                │   Chapter 08: Resource Management   │
                │   How much CPU and memory           │
                ├─────────────────────────────────────┤
                │   Chapter 09: Availability          │
                │   How many replicas, when           │
                └─────────────────────────────────────┘
                  ▲
                  │ Workloads deployed via Chapters 01-05

How The Layers Interact

The layers are not independent — they compose:

A pod hardened by security context (Ch 07) runs within the network boundaries set by network policies (Ch 06)
Resource requests (Ch 08) determine QoS class, which in turn affects which replicas HPA and PDB (Ch 09) can safely scale or evict
A NetworkPolicy (Ch 06) that blocks egress to the metrics endpoint breaks observability — which is why Chapter 10 comes next

The Guardrails You Have in Place

Guardrail	Source	What It Contains
Default-deny NetworkPolicy	Chapter 06	Lateral movement between compromised pods
`runAsNonRoot: true`	Chapter 07	Root-level exploits from a compromised container
`readOnlyRootFilesystem: true`	Chapter 07	Persistent filesystem tampering
Resource requests and limits	Chapter 08	Noisy-neighbor starvation and OOM cascades
`minReplicas: 2` for critical services	Chapter 09	Single-node failure becoming an outage
PDB `maxUnavailable`	Chapter 09	Node drains causing service downtime

Self-Check

You are ready for Chapter 10 when you can answer without looking:

Why is default-deny safer than allow-all, even though it creates more up-front work?
What is the standard pattern for writable /tmp when readOnlyRootFilesystem: true is enforced?
What evidence do you need before raising a pod’s memory limit?
Why is minReplicas: 1 a reliability regression, even for a deployment that “never fails”?
If a PDB blocks a node drain, what does that tell you about the deployment’s replica count?

If any of these are unclear, revisit the relevant chapter before moving forward.

What Comes Next

You can now deliver code safely (Ch 01-05) and your workloads have runtime constraints (Ch 06-09). The next block (Chapters 10-14) shifts to what happens when something goes wrong:

Chapter 10 — Observability: how you see what is happening
Chapter 11 — Backup and Restore: how you recover from data loss
Chapter 12 — Controlled Chaos: how you rehearse failure
Chapter 13 — AI-Assisted SRE Guardian: how you route incidents safely
Chapter 14 — 24/7 Production SRE: how humans coordinate response

These five chapters turn your running system into an operable, observable, recoverable production platform.