Lab Scenarios
- Chaos Triage: Trigger a controlled chaos drill (from Chapter 12). Observe how the Guardian captures and normalizes the resulting alerts into a single incident record.
- Confidence Analysis: Analyze an incident where the Guardian has a low confidence score (< 0.7). Explain why the provided context was insufficient for a high-confidence hypothesis.
- Escalation Drill: Trigger the same failure three times within an hour. Verify that the Guardian correctly transitions the incident from Fresh to Recurring to Persistent.
Core Exercises (Required)
- View Incidents: Port-forward to the
k8s-ai-monitorservice and usecurlto view the current list of active incidents. - Audit Sanitization: Check the Guardian’s logs for “redacted” entries during a triage event. Identify which specific fields were removed.
- Acknowledge & Resolve: Use the Guardian’s API or CLI to acknowledge an incident, perform a manual fix, and then resolve the incident record.
Challenge Exercise (Optional)
Custom Alert Routing Rule: Configure a new alert routing rule in k8s-ai-monitor for a custom metric threshold. Verify that the guardian correctly normalizes, enriches, and routes the alert through the expected channel.
Done When
You have completed this chapter when:
- The Guardian has captured and analyzed at least one controlled chaos scenario.
- You can explain why the Guardian has a read-only RBAC boundary.
- You have successfully acknowledged and resolved an incident via the API or CLI.
- You can demonstrate an escalation scenario (Recurring or Persistent).
- You understand the importance of redacting secrets before LLM analysis.
Knowledge Check
Before finishing this chapter, complete the Quiz to verify your understanding of the guardrail principles.