Lab & Completion | SafeOps Academy

3 Signals, 1 Incident Exercise

For one controlled incident (e.g., triggering a /panic or /status/500 route), capture all three artifacts:

Metrics Symptom: Latency or error-rate spike in Grafana.
Trace Path: Showing the failing route and span chain in Uptrace.
Log Evidence: Matching backend log with the correct trace_id.

Success Condition: All three artifacts must point to the same causal path.

Core Exercises (Required)

Find a Trace: Trigger an action in the frontend. Find its end-to-end trace in Uptrace.
Correlate with Logs: Copy the trace_id from the Uptrace span. Use kubectl logs to find the matching backend log entry.
Verify Alert: Trigger a high error rate and observe the alert rule in Prometheus. Verify if the alert is captured by k8s-ai-monitor.
SLO Check: Identify the backend availability SLI/SLO in your Prometheus rules and explain the burn-rate alert.

Challenge Exercise (Optional)

End-to-End Signal Correlation: Trigger a controlled backend error, then trace it end-to-end through all three signals: find the metric spike in Grafana, locate the trace in Uptrace, and correlate the log entry using only the trace_id.

Done When

You have completed this chapter when:

You can find an end-to-end trace from frontend to backend.
You can match a backend log entry by trace_id.
You can explain why the current alert path goes through k8s-ai-monitor.
You have successfully run the incident workflow: metrics -> traces -> logs.
You understand why metrics alone are not enough for root cause analysis.

Knowledge Check

Before finishing this chapter, complete the Quiz to verify your understanding of the guardrail principles.

Estimated Time

Prerequisites

Source Code References

What You Will Produce

3 Signals, 1 Incident Exercise

Core Exercises (Required)

Challenge Exercise (Optional)

Done When

Knowledge Check