Workflow & Operating Model | SafeOps Academy

Guardrails That Stop It

Owner-Per-Role: Every incident must have an assigned owner for Command, Comms, and Execution.
Evidence-First: Metrics, traces, and logs must be captured before any high-risk production change.
Mandatory Postmortems: All Sev0 and Sev1 incidents require a blameless postmortem within 48 hours.
AI Boundary Policy: AI tools can analyze and recommend, but humans must own the final decision and execution.

Core SRE Principles

Evidence Over Urgency: Act based on confirmed signals (Chapter 10), not on panic.
Blameless Response: Focus on system gaps and guardrail failures, not individual mistakes.
Controlled Escalation: Follow the severity-based communication and ownership model.

Operating Model (The Incident Team)

Incident Commander (IC): Strategist. Owns the decision-making and resource allocation.
Primary Responder: Surgeon. Owns the technical execution and verification.
Communications Lead: Voice. Owns stakeholder updates and status pages.
Scribe: Memory. Owns the timeline and evidence logging.

Safe Workflow (Step-by-Step)

Detect & Declare: Use Chapter 10 signals or Chapter 13 Guardian alerts to detect a failure. Declare severity.
Assign Roles: Identify the IC, Responder, and Comms Lead.
Build Timeline: Record every key metric change and operator command in a shared log.
Mitigate: Execute the lowest-risk fix first. Communicate status on a fixed cadence.
Resolve: Confirm recovery via metrics. Record the time of resolution.
Postmortem: Conduct a blameless review and assign hardening actions.

This builds on: AI-assisted guardian (Chapter 13) — on-call uses guardian for triage and enrichment. This enables: Capstone — all core guardrails are now operational.

Estimated Time

Prerequisites

What You Will Produce

Guardrails That Stop It

Core SRE Principles

Operating Model (The Incident Team)

Safe Workflow (Step-by-Step)