Advanced Track Do this after finishing Chapters 01-14.

Estimated Time

  • Reading: 30-40 min
  • Lab: 60-90 min
  • Quiz: 15-20 min

Prerequisites

  • Core track (Chapters 01-14) completed.
  • GitOps promotion and observability workflows available.

Source Code References

  • release.yaml Members
  • require-requests-limits.example.yaml Members

Sign in to view source code.

What You Will Produce

A go/no-go evidence package: rollout results, remediation notes, and explicit rollback conditions.

Chapter 16: Admission Policy Guardrails (Advanced)

Incident Hook

A workload is deployed during incident pressure with missing limits, mutable tags, and weak security context. Workstation hooks were skipped and review focused on speed. The pod starts in a risky configuration and causes noisy-neighbor impact. Recovery is slowed because the team lacks clear deny/exception discipline.

Observed Symptoms

What the team sees first:

  • risky workload settings reach the cluster boundary
  • upstream checks were skipped or insufficient
  • operators feel pressure to disable policy instead of fixing the manifest

The cluster is now the last clean place to stop the mistake.

Confusion Phase

Deny messages feel like friction when a release is already late. That is why broad exceptions appear attractive.

The real question is:

  • is the policy wrong
  • or is it correctly blocking a workload that upstream controls failed to stop

Why This Chapter Exists

Local checks (pre-commit, CI, review) reduce risk but can be bypassed. Admission control is the last enforcement point before runtime.

This chapter focuses on policy-as-code guardrails that block risky workloads even when upstream checks fail.

Learning Objectives

By the end of this chapter, learners can:

  • explain why cluster-side policy is mandatory in production systems
  • roll out Kyverno rules with Audit -> Enforce safely
  • troubleshoot deny events and remediate manifests correctly
  • run controlled break-glass exceptions with expiry and audit trail

What AI Would Propose (Brave Junior)

  • “Disable the policy engine temporarily.”
  • “Allow privileged mode now, fix later.”
  • “Create a broad exception for the whole namespace.”

Why this sounds reasonable:

  • immediate progress under pressure
  • lower friction in the moment

Why This Is Dangerous

  • Security and stability regressions reach runtime.
  • “Temporary” exceptions become long-term drift.
  • Platform trust model is weakened for all teams.

Investigation

Treat deny evidence as diagnostic input, not as an obstacle.

Safe investigation sequence:

  1. inspect the exact rule and deny message
  2. compare the violating manifest against the intended baseline
  3. fix the workload first, not the policy engine
  4. scope any exception narrowly and time-bound only when a real break-glass case exists

Containment

Containment keeps enforcement credible:

  1. leave the policy engine on
  2. remediate manifests or use bounded exceptions
  3. confirm the deny path now behaves as expected in Audit or Enforce
  4. promote policy changes gradually with evidence, not impulse

Guardrails That Stop It

  • Policy engine always-on (Kyverno).
  • Default rollout path: Audit then selective Enforce.
  • Exceptions must be scoped, time-bound, and approved.
  • Deny evidence is mandatory before policy changes.

Prerequisites

  • Kyverno policy engine installed and reconciling via Flux.
  • Admission policy templates available (see the investigation snapshots below).
  • Rollout approach: start in Audit mode, collect deny evidence, then selectively Enforce.

Investigation Snapshots

Here is the Kyverno release used in the SafeOps system to establish the cluster-side enforcement boundary.

Kyverno release

Show the Kyverno release
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: kyverno
  namespace: flux-system
spec:
  interval: 30m
  chart:
    spec:
      chart: kyverno
      version: "3.7.1"
      sourceRef:
        kind: HelmRepository
        name: kyverno
        namespace: flux-system
      interval: 12h
  targetNamespace: kyverno
  install:
    createNamespace: false
    remediation:
      retries: 3
  upgrade:
    remediation:
      retries: 3
  values:
    admissionController:
      replicas: 2
    backgroundController:
      enabled: true
    cleanupController:
      enabled: true
    reportsController:
      enabled: true

Here is the policy template that requires CPU and memory requests and limits before a workload is admitted.

Requests and limits policy

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-requests-limits-example
spec:
  validationFailureAction: Audit
  background: true
  rules:
    - name: require-cpu-memory-requests-limits
      match:
        any:
          - resources:
              kinds:
                - Pod
      validate:
        message: "CPU and memory requests/limits are required for all containers."
        pattern:
          spec:
            containers:
              - resources:
                  requests:
                    cpu: "?*"
                    memory: "?*"
                  limits:
                    cpu: "?*"
                    memory: "?*"

System Context

This chapter turns earlier best practices into enforceable runtime boundaries.

It reinforces:

  • Chapter 07 hardening baselines
  • Chapter 08 requests and limits discipline
  • Chapter 15 supply-chain verification at admission time

Policy Pack Cookbook (Baseline 5)

Recommended baseline policy set for most teams:

  1. require non-root execution
  2. require requests/limits
  3. allow only trusted registries
  4. disallow mutable tags (latest)
  5. disallow privileged containers

Rollout model for each policy:

  • start in Audit
  • remediate violations
  • move to Enforce

Safe Workflow (Step-by-Step)

  1. Enable selected policies in Audit.
  2. Trigger known violations intentionally in develop.
  3. Review policy reports and event messages.
  4. Fix manifests, not engine settings.
  5. Move stable rules to Enforce in non-production.
  6. Promote enforcement gradually across environments.

Lab Files

  • lab.md
  • runbook-admission-policy.md
  • quiz.md

Done When

  • learner demonstrates Audit -> Enforce with clear evidence
  • learner can perform deny triage and manifest remediation
  • learner can apply a safe exception process without global bypass

Hands-On Materials

Labs, quizzes, and runbooks — available to course members.

  • Admission Policy Guardrails Scorecard (Template) Members
  • Lab: Admission Guardrails in Audit and Enforce Modes (Advanced) Members
  • Quiz: Chapter 16 (Admission Policy Guardrails) Members
  • Runbook: Admission Policy Operations (Advanced) Members