Incident Hook
A critical alert (Sev1) starts outside of business hours. Responders join quickly, but roles are unclear, updates are inconsistent, and technical actions race each other. Time is lost coordinating people instead of restoring service.
Result: The outage lasts longer and stress levels are higher because the team lacked a practiced organizational response model.
Observed Symptoms
What the team sees first:
- Multiple responders join the call, but ownership of the “command” is unclear.
- Communication cadence with stakeholders is inconsistent.
- Parallel actions start before a single, shared evidence picture exists.
The incident is already harder than it needs to be before any technical fix even lands.
Severity Matrix (Sev0 - Sev3)
We use a defined severity matrix to set expectations for response time and communication frequency:
| Severity | Typical Impact | Response Target | Escalation |
|---|---|---|---|
| Sev0 | Critical, business-wide outage | Immediate Command | Page all responders + leadership |
| Sev1 | Major user-facing degradation | Rapid Coordinated | Page core service owners |
| Sev2 | Partial failure with workaround | Planned Urgent | Notify owning team + on-call |
| Sev3 | Low-impact defect or noise | Normal Backlog | Track for recurrence |
What AI Would Propose (Brave Junior):
- “Skip incident command and jump straight to fixes.”
- “Postmortem can wait; just close the ticket after recovery.”
- “Let AI choose remediation automatically if confidence is high.”
Pause and Predict: Before reading the investigation, write down your top 3 hypotheses. What would you check first?