Lab: Baseline Observability with Uptrace

Goal

Validate that telemetry is operational and correlated:

frontend creates spans for user actions
backend receives trace context and emits correlated logs
Uptrace shows trace chain and related service signals
Prometheus alert path is connected to the same incident workflow

Prerequisites

frontend and backend are deployed in one environment (recommended: develop)
Uptrace DSN is configured in secrets and injected into workloads
Flux reconciliation is healthy

Quick checks:

kubectl -n flux-system get kustomizations
kubectl -n develop get deploy frontend backend
kubectl -n develop get secret backend-secrets
kubectl -n observability get prometheusrule backend-alerts backend-slo-rules

Step 1: Verify Runtime Telemetry Config

kubectl -n develop get deploy frontend -o jsonpath='{.spec.template.spec.containers[0].env[?(@.name=="VITE_UPTRACE_DSN")].name}{"\n"}'
kubectl -n develop get deploy backend -o jsonpath='{.spec.template.spec.containers[0].env[?(@.name=="UPTRACE_DSN")].name}{"\n"}'

Expected:

frontend has VITE_UPTRACE_DSN
backend has UPTRACE_DSN

Step 2: Generate Trace via Frontend

Open frontend UI.
Go to Chaos page.
Trigger one action:

delay or status action for non-destructive check
panic action for crash/correlation drill

Expected:

frontend creates manual span (for example ui.chaos.trigger_panic)
backend receives request with propagated trace context

Step 3: Verify in Uptrace

In Uptrace, find the trace from the recent action and confirm:

frontend span exists
backend HTTP span is a child in the same trace
status/error details are visible on backend span

Step 4: Verify Correlated Backend Logs

Get recent backend logs:

kubectl -n develop logs deploy/backend --tail=200

Expected:

request/error logs contain trace_id
for panic flow, log contains panic termination message with same trace_id

Step 5: Capture Evidence

For lab completion, attach:

one Uptrace trace screenshot/id
one backend log snippet with matching trace_id
one alert snapshot (BackendHighLatency or one SLO burn-rate alert state)
one short conclusion (root cause + next action)

Step 6: Optional Alert Drill (Recommended)

Trigger /status/500 repeatedly from Chaos page for 5-10 minutes.
In Prometheus Alerts UI, verify one error-rate alert enters pending or firing.
Pivot to Uptrace trace + backend log evidence before deciding action.

Hard Stop Conditions

telemetry secrets missing or plaintext in Git
no trace context propagation (orphan backend spans only)
on-call action chosen without evidence from at least two signals

Failure Scenarios

No traces in Uptrace

verify DSN wiring in frontend/backend env
verify app can reach Uptrace endpoint

Backend spans exist but not linked to frontend spans

verify propagation headers are allowed by CORS (traceparent, tracestate, baggage)
verify frontend instrumentation is enabled

Logs exist but no trace_id

verify request logging path and panic handler logging
verify request executed via instrumented routes

Step 7: Verify Structured Logging

Check backend pod logs for JSON-formatted structured output:

kubectl -n develop logs deploy/backend --tail=20 | head -5

Expected:

log lines are JSON objects (not plaintext)
each line contains time, level, msg fields
request-related logs contain trace_id field

Identify the trace_id field in a log line and verify it matches a trace in Uptrace:

kubectl -n develop logs deploy/backend --tail=100 | grep trace_id | head -3

If logs are plaintext instead of JSON:

check backend logging configuration
structured logging may need to be enabled via environment variable or config

Step 8: Verify OTEL Collector Log Pipeline

Check that the OTEL Collector DaemonSet is running:

kubectl -n observability get daemonset otel-collector

Expected: one pod per node, all in Ready state.

Check Collector logs for export errors:

kubectl -n observability logs daemonset/otel-collector --tail=20

Expected: no persistent export errors. Occasional retries are normal.

Verify logs appear in Uptrace:

Open Uptrace UI → Logs section
Filter by k8s.namespace.name = develop
Confirm log entries are arriving from backend and frontend pods
Find a log entry with a trace_id and click through to its associated trace
Verify the trace contains the expected frontend → backend span chain

If no logs appear:

verify the DaemonSet has access to /var/log/pods on each node (hostPath volume)
verify the UPTRACE_DSN secret is mounted in the Collector pods
check Collector logs for authentication or connectivity errors

Done When

learner can produce one correlated incident sample (trace + log by trace_id)
learner can explain the chosen action based on evidence
learner can identify whether issue is config, propagation, or runtime behavior
learner can identify at least one matching alert for the same symptom
learner can verify structured JSON logging with trace_id correlation

Estimated Time

Prerequisites

Artifacts

What You Will Produce

Lab: Baseline Observability with Uptrace

Goal

Prerequisites

Step 1: Verify Runtime Telemetry Config

Step 2: Generate Trace via Frontend

Step 3: Verify in Uptrace

Step 4: Verify Correlated Backend Logs

Step 5: Capture Evidence

Step 6: Optional Alert Drill (Recommended)

Hard Stop Conditions

Failure Scenarios

Step 7: Verify Structured Logging

Step 8: Verify OTEL Collector Log Pipeline

Done When