Core Track Guardrails-first chapter in core learning path.

Estimated Time

  • Reading: 20-25 min
  • Lab: 45-60 min
  • Quiz: 10-15 min

Prerequisites

Source Code References

Primary chapter content only.

What You Will Produce

A reproducible lab result plus quiz verification and incident-safe operating evidence.

Core Exercises (Required)

  1. Incident Simulation: Use the Chaos Monkey (Chapter 12) to trigger a Sev1 failure. Assign roles within your team (or simulate them) and follow the Safe Workflow.
  2. Timeline Generation: Using your simulation, create a 10-point timeline that includes at least three metric signals, two logs, and four operator actions.
  3. Draft a Postmortem: Use the template in sre/docs/postmortem-template.md to document your simulation. Focus on the “5 Whys” analysis.

Postmortem Quality Bar

A postmortem is successful only if it includes:

  • Evidence-backed Timeline: All major events are tied to a metric, log, or trace.
  • Causal Analysis: Identifies why the system allowed the failure, not just who did it.
  • Hardening Actions: Specific tasks with an owner, a due date, and a validation method.
  • Blameless Tone: Focuses on technical and process improvements.

Challenge Exercise (Optional)

Full Tabletop Incident Simulation: Run a full tabletop incident simulation with all four roles assigned. Process the incident through severity declaration, investigation, containment, and resolution. Produce a complete blameless postmortem document.

Done When

You have completed this chapter and the Core Track when:

  • You can run a full incident lifecycle with assigned roles and severity levels.
  • You have produced a complete, high-quality blameless postmortem.
  • You understand the difference between technical mitigation and organizational coordination.
  • You can define and verify technical hardening actions from incident evidence.

Knowledge Check

Before finishing this chapter, complete the Quiz to verify your understanding of the guardrail principles.