Guardrails That Stop It
- Metric-Driven Gates: Every traffic shift is gated by real-time latency and error rate checks.
- Maximum Weight Limit: Canaries start at low weights (e.g., 5%) to minimize initial impact.
- Automated Revert: Any violation of health metrics triggers an immediate, autonomous rollback.
- Iterative Analysis: The system checks health at every step of the traffic increase.
Flagger Components
We use Flagger to automate the progressive delivery lifecycle:
- Canary Object: The manifest that defines the traffic shifting rules and health metrics.
- Analysis: Defines the PromQL queries used to validate the canary pods.
- Primary Service: The stable version of your application.
- Canary Service: The new, candidate version of your application.
Metric-Driven Analysis
Flagger queries Prometheus to determine if the release should proceed:
- Request Success Rate: Percentage of requests returning non-5xx errors.
- Request Latency: P99 latency must stay below a defined threshold (e.g., 500ms).
Safe Workflow (Step-by-Step)
- Deploy Canary Manifest: Define your thresholds and traffic steps in a
Canaryobject. - Update Image: Trigger the deployment by updating the image tag in Git.
- Wait for Initialization: Flagger creates the “Primary” deployment and shifting infrastructure.
- Monitor Traffic: Observe traffic shifting from 0% to 5% to 10%…
- Finalize: Once 100% traffic is shifted safely, Flagger promotes the canary to the stable version.
This builds on: Core track complete — progressive delivery adds traffic-level blast radius control. This enables: Advanced practice — canary analysis extends observability from Chapter 10.