Investigation & Containment

Core Track Guardrails-first chapter in core learning path.

Estimated Time

A reproducible lab result plus quiz verification and incident-safe operating evidence.

Treat the SRE Guardian itself as a guarded incident pipeline.

Safe investigation sequence:

Inspect Raw Signals: Review the raw Kubernetes events and metrics entering the Guardian.
Verify Sanitization: Confirm that secrets, tokens, and context budgets are correctly handled before LLM analysis.
Confirm Deduplication: Ensure that the Guardian correctly collapsed multiple related alerts into a single incident record.
Review Proposed Actions: Check if the AI-suggested actions are useful and stay within the “no-mutation” boundary.

Containment keeps the Guardian helpful but safely bounded.

Containment steps:

Preserve Human Approval: Do not allow any remediation step to execute without explicit human sign-off.
Reduce Noise: Tune deduplication and escalation rules to prevent alert fatigue.
Block Unsafe Context: Regularly audit the sanitization logic to prevent secret leakage.
Treat Low Confidence as Review: Handle incidents with low AI confidence as high-priority human review items rather than automation failures.

Pause and Predict: What automated guardrail would have prevented this incident entirely?