3 Signals, 1 Incident Exercise
For one controlled incident (e.g., triggering a /panic or /status/500 route), capture all three artifacts:
- Metrics Symptom: Latency or error-rate spike in Grafana.
- Trace Path: Showing the failing route and span chain in Uptrace.
- Log Evidence: Matching backend log with the correct
trace_id.
Success Condition: All three artifacts must point to the same causal path.
Core Exercises (Required)
- Find a Trace: Trigger an action in the frontend. Find its end-to-end trace in Uptrace.
- Correlate with Logs: Copy the
trace_idfrom the Uptrace span. Usekubectl logsto find the matching backend log entry. - Verify Alert: Trigger a high error rate and observe the alert rule in Prometheus. Verify if the alert is captured by
k8s-ai-monitor. - SLO Check: Identify the backend availability SLI/SLO in your Prometheus rules and explain the burn-rate alert.
Challenge Exercise (Optional)
End-to-End Signal Correlation: Trigger a controlled backend error, then trace it end-to-end through all three signals: find the metric spike in Grafana, locate the trace in Uptrace, and correlate the log entry using only the trace_id.
Done When
You have completed this chapter when:
- You can find an end-to-end trace from frontend to backend.
- You can match a backend log entry by
trace_id. - You can explain why the current alert path goes through
k8s-ai-monitor. - You have successfully run the incident workflow:
metrics -> traces -> logs. - You understand why metrics alone are not enough for root cause analysis.
Knowledge Check
Before finishing this chapter, complete the Quiz to verify your understanding of the guardrail principles.