Chapter 16: Admission Policy Guardrails (Advanced)
Incident Hook
A workload is deployed during incident pressure with missing limits, mutable tags, and weak security context. Workstation hooks were skipped and review focused on speed. The pod starts in a risky configuration and causes noisy-neighbor impact. Recovery is slowed because the team lacks clear deny/exception discipline.
Observed Symptoms
What the team sees first:
- risky workload settings reach the cluster boundary
- upstream checks were skipped or insufficient
- operators feel pressure to disable policy instead of fixing the manifest
The cluster is now the last clean place to stop the mistake.
Confusion Phase
Deny messages feel like friction when a release is already late. That is why broad exceptions appear attractive.
The real question is:
- is the policy wrong
- or is it correctly blocking a workload that upstream controls failed to stop
Why This Chapter Exists
Local checks (pre-commit, CI, review) reduce risk but can be bypassed. Admission control is the last enforcement point before runtime.
This chapter focuses on policy-as-code guardrails that block risky workloads even when upstream checks fail.
Learning Objectives
By the end of this chapter, learners can:
- explain why cluster-side policy is mandatory in production systems
- roll out Kyverno rules with
Audit -> Enforcesafely - troubleshoot deny events and remediate manifests correctly
- run controlled break-glass exceptions with expiry and audit trail
What AI Would Propose (Brave Junior)
- “Disable the policy engine temporarily.”
- “Allow privileged mode now, fix later.”
- “Create a broad exception for the whole namespace.”
Why this sounds reasonable:
- immediate progress under pressure
- lower friction in the moment
Why This Is Dangerous
- Security and stability regressions reach runtime.
- “Temporary” exceptions become long-term drift.
- Platform trust model is weakened for all teams.
Investigation
Treat deny evidence as diagnostic input, not as an obstacle.
Safe investigation sequence:
- inspect the exact rule and deny message
- compare the violating manifest against the intended baseline
- fix the workload first, not the policy engine
- scope any exception narrowly and time-bound only when a real break-glass case exists
Containment
Containment keeps enforcement credible:
- leave the policy engine on
- remediate manifests or use bounded exceptions
- confirm the deny path now behaves as expected in
AuditorEnforce - promote policy changes gradually with evidence, not impulse
Guardrails That Stop It
- Policy engine always-on (Kyverno).
- Default rollout path:
Auditthen selectiveEnforce. - Exceptions must be scoped, time-bound, and approved.
- Deny evidence is mandatory before policy changes.
Prerequisites
- Kyverno policy engine installed and reconciling via Flux.
- Admission policy templates available (see the investigation snapshots below).
- Rollout approach: start in
Auditmode, collect deny evidence, then selectivelyEnforce.
Investigation Snapshots
Here is the Kyverno release used in the SafeOps system to establish the cluster-side enforcement boundary.
Kyverno release
Show the Kyverno release
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: kyverno
namespace: flux-system
spec:
interval: 30m
chart:
spec:
chart: kyverno
version: "3.7.1"
sourceRef:
kind: HelmRepository
name: kyverno
namespace: flux-system
interval: 12h
targetNamespace: kyverno
install:
createNamespace: false
remediation:
retries: 3
upgrade:
remediation:
retries: 3
values:
admissionController:
replicas: 2
backgroundController:
enabled: true
cleanupController:
enabled: true
reportsController:
enabled: true
Here is the policy template that requires CPU and memory requests and limits before a workload is admitted.
Requests and limits policy
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-requests-limits-example
spec:
validationFailureAction: Audit
background: true
rules:
- name: require-cpu-memory-requests-limits
match:
any:
- resources:
kinds:
- Pod
validate:
message: "CPU and memory requests/limits are required for all containers."
pattern:
spec:
containers:
- resources:
requests:
cpu: "?*"
memory: "?*"
limits:
cpu: "?*"
memory: "?*"
System Context
This chapter turns earlier best practices into enforceable runtime boundaries.
It reinforces:
- Chapter 07 hardening baselines
- Chapter 08 requests and limits discipline
- Chapter 15 supply-chain verification at admission time
Policy Pack Cookbook (Baseline 5)
Recommended baseline policy set for most teams:
- require non-root execution
- require requests/limits
- allow only trusted registries
- disallow mutable tags (
latest) - disallow privileged containers
Rollout model for each policy:
- start in
Audit - remediate violations
- move to
Enforce
Safe Workflow (Step-by-Step)
- Enable selected policies in
Audit. - Trigger known violations intentionally in
develop. - Review policy reports and event messages.
- Fix manifests, not engine settings.
- Move stable rules to
Enforcein non-production. - Promote enforcement gradually across environments.
Lab Files
lab.mdrunbook-admission-policy.mdquiz.md
Done When
- learner demonstrates
Audit -> Enforcewith clear evidence - learner can perform deny triage and manifest remediation
- learner can apply a safe exception process without global bypass