The Incident: Coordination Chaos

Incident Hook

A critical alert (Sev1) starts outside of business hours. Responders join quickly, but roles are unclear, updates are inconsistent, and technical actions race each other. Time is lost coordinating people instead of restoring service.

Result: The outage lasts longer and stress levels are higher because the team lacked a practiced organizational response model.

Observed Symptoms

What the team sees first:

Multiple responders join the call, but ownership of the “command” is unclear.
Communication cadence with stakeholders is inconsistent.
Parallel actions start before a single, shared evidence picture exists.

The incident is already harder than it needs to be before any technical fix even lands.

Severity Matrix (Sev0 - Sev3)

We use a defined severity matrix to set expectations for response time and communication frequency:

Severity	Typical Impact	Response Target	Escalation
Sev0	Critical, business-wide outage	Immediate Command	Page all responders + leadership
Sev1	Major user-facing degradation	Rapid Coordinated	Page core service owners
Sev2	Partial failure with workaround	Planned Urgent	Notify owning team + on-call
Sev3	Low-impact defect or noise	Normal Backlog	Track for recurrence

What AI Would Propose (Brave Junior):

“Skip incident command and jump straight to fixes.”
“Postmortem can wait; just close the ticket after recovery.”
“Let AI choose remediation automatically if confidence is high.”

Pause and Predict: Before reading the investigation, write down your top 3 hypotheses. What would you check first?

Estimated Time

Prerequisites

What You Will Produce

Incident Hook

Observed Symptoms

Severity Matrix (Sev0 - Sev3)