Guardrails That Stop It
- Mandatory Definitions: Every container must have explicit CPU, memory, and ephemeral-storage requests and limits.
- Quota Enforcement: All namespaces must use
LimitRangeandResourceQuotato prevent resource starvation. - Evidence-First Scaling: OOM and throttling analysis must happen before any scaling decisions.
Expected Baseline
- Develop, Staging, Production: Each environment has unique resource bounds and quotas.
- Guaranteed QoS: Used for critical, stateful, or highly sensitive services.
- Burstable QoS: Used for standard, non-critical web and API workloads.
Backend resource block
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: backend
labels:
app: backend
app.kubernetes.io/name: backend
app.kubernetes.io/component: api
spec:
replicas: 1
revisionHistoryLimit: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 1
selector:
matchLabels:
app: backend
template:
metadata:
labels:
app: backend
app.kubernetes.io/name: backend
app.kubernetes.io/component: api
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/metrics"
spec:
imagePullSecrets:
- name: ghcr-credentials-docker
securityContext:
runAsNonRoot: true
runAsUser: 10001
runAsGroup: 10001
fsGroup: 10001
seccompProfile:
type: RuntimeDefault
containers:
- name: backend
image: backend:latest
imagePullPolicy: IfNotPresent
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 10001
runAsGroup: 10001
capabilities:
drop:
- ALL
ports:
- containerPort: 8080
name: http
protocol: TCP
env:
- name: PORT
value: "8080"
- name: NAMESPACE
value: "${NAMESPACE}"
- name: ENVIRONMENT
value: "${ENVIRONMENT}"
- name: LOG_LEVEL
value: "${LOG_LEVEL}"
- name: SERVICE_NAME
value: "backend"
- name: SERVICE_VERSION
value: "v1.0.0"
- name: DEPLOYMENT_ENVIRONMENT
value: "${ENVIRONMENT}"
- name: OTEL_RESOURCE_ATTRIBUTES
value: "k8s.cluster.name=${cluster_name}"
- name: UPTRACE_DSN
valueFrom:
secretKeyRef:
name: backend-secrets
key: uptrace-dsn
- name: OTEL_EXPORTER_OTLP_HEADERS
valueFrom:
secretKeyRef:
name: backend-secrets
key: uptrace-headers
- name: JWT_SECRET
valueFrom:
secretKeyRef:
name: backend-secrets
key: jwt-secret
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
name: app-postgres-app
key: username
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: app-postgres-app
key: password
- name: POSTGRES_HOST
value: app-postgres-rw
- name: POSTGRES_DB
value: app
livenessProbe:
httpGet:
path: /livez
port: http
initialDelaySeconds: 15
periodSeconds: 20
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /readyz
port: http
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 3
failureThreshold: 3
startupProbe:
httpGet:
path: /healthz
port: http
initialDelaySeconds: 0
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 30
resources:
requests:
cpu: 10m
memory: 32Mi
ephemeral-storage: 64Mi
limits:
cpu: 100m
memory: 128Mi
ephemeral-storage: 128Mi
volumeMounts:
- name: tmp
mountPath: /tmp
- name: cache
mountPath: /home/app/.cache
volumes:
- name: tmp
emptyDir: {}
- name: cache
emptyDir:
sizeLimit: 10Mi
Safe Workflow (Step-by-Step)
- Verify Definitions: Confirm every container in your manifest has explicit requests and limits.
- Validate Quotas: Check the target namespace
ResourceQuotato ensure you have enough budget. - Rollout & Monitor: Watch for pod events (
OOMKilled,Evicted) during the rollout. - Load Test: Simulate peak traffic in
developand observe the QoS behavior. - Promote: Promote sizing adjustments incrementally across environments.
Develop quota and limit baseline
Show the develop resource baseline
flux/infrastructure/resource-management/develop/kustomization.yamlflux/infrastructure/resource-management/develop/limitrange.yamlflux/infrastructure/resource-management/develop/resourcequota.yaml
This builds on: Pod hardening (Chapter 07) — resource limits complement security constraints. This enables: Availability engineering (Chapter 09) — scaling requires resource evidence.