Investigation
Treat canary metrics as the definitive proof of release safety.
Safe investigation sequence:
- Verify Traffic Split: Check the current traffic distribution between the stable and canary versions.
- Monitor Canary Metrics: Analyze the latency and error rates specifically for the canary pods.
- Compare with Stable: Compare canary performance against the baseline of the stable version.
- Inspect Flagger Events: Review the Flagger logs or Kubernetes events to see why an analysis is progressing or stalled.
Containment
Containment is built directly into the progressive delivery engine.
Containment steps:
- Automatic Halt: Flagger automatically stops the traffic shift if analysis thresholds are violated.
- Immediate Revert: Traffic is shifted back to 100% stable version instantly upon failure detection.
- Isolate Canary Pods: Keep the failed canary pods running (but without traffic) for debugging.
- Fix Upstream: Resolve the bug in Git and push a new version to restart the canary cycle.
The goal is “automated safety,” where the system protects the user from bad code.
Pause and Predict: What automated guardrail would have prevented this incident entirely?