Workflow & Baseline | SafeOps Academy

Guardrails That Stop It

Mandatory Definitions: Every container must have explicit CPU, memory, and ephemeral-storage requests and limits.
Quota Enforcement: All namespaces must use LimitRange and ResourceQuota to prevent resource starvation.
Evidence-First Scaling: OOM and throttling analysis must happen before any scaling decisions.

Expected Baseline

Develop, Staging, Production: Each environment has unique resource bounds and quotas.
Guaranteed QoS: Used for critical, stateful, or highly sensitive services.
Burstable QoS: Used for standard, non-critical web and API workloads.

Backend resource block

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: backend
  labels:
    app: backend
    app.kubernetes.io/name: backend
    app.kubernetes.io/component: api
spec:
  replicas: 1
  revisionHistoryLimit: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0
      maxSurge: 1
  selector:
    matchLabels:
      app: backend
  template:
    metadata:
      labels:
        app: backend
        app.kubernetes.io/name: backend
        app.kubernetes.io/component: api
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8080"
        prometheus.io/path: "/metrics"
    spec:
      imagePullSecrets:
      - name: ghcr-credentials-docker
      securityContext:
        runAsNonRoot: true
        runAsUser: 10001
        runAsGroup: 10001
        fsGroup: 10001
        seccompProfile:
          type: RuntimeDefault
      containers:
      - name: backend
        image: backend:latest
        imagePullPolicy: IfNotPresent
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          runAsUser: 10001
          runAsGroup: 10001
          capabilities:
            drop:
              - ALL
        ports:
        - containerPort: 8080
          name: http
          protocol: TCP
        env:
        - name: PORT
          value: "8080"
        - name: NAMESPACE
          value: "${NAMESPACE}"
        - name: ENVIRONMENT
          value: "${ENVIRONMENT}"
        - name: LOG_LEVEL
          value: "${LOG_LEVEL}"
        - name: SERVICE_NAME
          value: "backend"
        - name: SERVICE_VERSION
          value: "v1.0.0"
        - name: DEPLOYMENT_ENVIRONMENT
          value: "${ENVIRONMENT}"
        - name: OTEL_RESOURCE_ATTRIBUTES
          value: "k8s.cluster.name=${cluster_name}"
        - name: UPTRACE_DSN
          valueFrom:
            secretKeyRef:
              name: backend-secrets
              key: uptrace-dsn
        - name: OTEL_EXPORTER_OTLP_HEADERS
          valueFrom:
            secretKeyRef:
              name: backend-secrets
              key: uptrace-headers
        - name: JWT_SECRET
          valueFrom:
            secretKeyRef:
              name: backend-secrets
              key: jwt-secret
        - name: POSTGRES_USER
          valueFrom:
            secretKeyRef:
              name: app-postgres-app
              key: username
        - name: POSTGRES_PASSWORD
          valueFrom:
            secretKeyRef:
              name: app-postgres-app
              key: password
        - name: POSTGRES_HOST
          value: app-postgres-rw
        - name: POSTGRES_DB
          value: app
        livenessProbe:
          httpGet:
            path: /livez
            port: http
          initialDelaySeconds: 15
          periodSeconds: 20
          timeoutSeconds: 5
          failureThreshold: 3
        readinessProbe:
          httpGet:
            path: /readyz
            port: http
          initialDelaySeconds: 5
          periodSeconds: 10
          timeoutSeconds: 3
          failureThreshold: 3
        startupProbe:
          httpGet:
            path: /healthz
            port: http
          initialDelaySeconds: 0
          periodSeconds: 5
          timeoutSeconds: 3
          failureThreshold: 30
        resources:
          requests:
            cpu: 10m
            memory: 32Mi
            ephemeral-storage: 64Mi
          limits:
            cpu: 100m
            memory: 128Mi
            ephemeral-storage: 128Mi
        volumeMounts:
        - name: tmp
          mountPath: /tmp
        - name: cache
          mountPath: /home/app/.cache
      volumes:
      - name: tmp
        emptyDir: {}
      - name: cache
        emptyDir:
          sizeLimit: 10Mi

Safe Workflow (Step-by-Step)

Verify Definitions: Confirm every container in your manifest has explicit requests and limits.
Validate Quotas: Check the target namespace ResourceQuota to ensure you have enough budget.
Rollout & Monitor: Watch for pod events (OOMKilled, Evicted) during the rollout.
Load Test: Simulate peak traffic in develop and observe the QoS behavior.
Promote: Promote sizing adjustments incrementally across environments.

Develop quota and limit baseline

Show the develop resource baseline

flux/infrastructure/resource-management/develop/kustomization.yaml
flux/infrastructure/resource-management/develop/limitrange.yaml
flux/infrastructure/resource-management/develop/resourcequota.yaml

This builds on: Pod hardening (Chapter 07) — resource limits complement security constraints. This enables: Availability engineering (Chapter 09) — scaling requires resource evidence.

Estimated Time

Prerequisites

Source Code References

What You Will Produce

Guardrails That Stop It

Expected Baseline

Safe Workflow (Step-by-Step)