Chapter 06: Network Policies (Production Isolation)
Why This Chapter Exists
Without network isolation, one compromised pod can move laterally across environments. This chapter introduces a safe baseline:
- default deny
- explicit allow rules
- DNS and ingress paths opened intentionally
Incident Hook
A debug pod in develop reaches internal services it should never touch.
No exploit sophistication is needed, only open east-west traffic.
When incident starts, responders cannot quickly prove or limit blast radius.
Network policies turn this into an auditable allowlist model.
What AI Would Propose (Brave Junior)
- “Skip policies for now to avoid breaking traffic.”
- “We can secure networking later after release.”
Why this sounds reasonable:
- avoids immediate traffic risk
- seems faster during release pressure
Why This Is Dangerous
- Flat networking means high lateral-movement risk.
- Production and non-production boundaries become weak.
- Incidents are harder to contain under pressure.
Guardrails That Stop It
- Start from default deny in target namespace.
- Add minimum allow rules one by one with verification.
- Keep policy changes isolated from application changes.
- Keep rollback manifest ready before applying restrictive policies.
Common AI Trap
AI often suggests broad allow rules to “get traffic working”:
0.0.0.0/0egress- namespace-wide allow-all policy
- temporary wildcard selectors
Do not apply these shortcuts. Fix exact source/destination/path requirements instead.
Repo Mapping
Platform repository references:
- Namespace manifests
- Backend ingress
- Frontend ingress
- Network policy baseline
- Develop overlay
- Staging overlay
- Production overlay
- Flux infrastructure wiring
- Flux apps wiring
Safe Workflow (Step-by-Step)
- Start from namespace default-deny policy in
develop. - Add minimal allow rules in order:
- DNS first
- ingress path second
- required egress last
- Test each allow rule before adding the next one.
- Run blocked-traffic triage for failures:
- DNS resolution
- namespace/pod labels
- egress target and policy selector match
- Reject “allow all” shortcuts even for temporary fixes; patch specific policy instead.
- Promote policy changes environment by environment with evidence.
Blocked Traffic Triage Playbook
When traffic is blocked:
- Check DNS resolution from source pod.
- Confirm source and destination labels match policy selectors.
- Verify namespace labels used by
namespaceSelector. - Validate port/protocol correctness in policy rules.
- Confirm egress destination (service vs IP) matches allowed targets.
- Re-test with one rule change at a time and capture evidence.
Lab Files
lab.mdquiz.md
Done When
- learner can apply default deny without losing control of the environment
- learner can allow only required DNS + ingress traffic
- learner can debug and explain blocked traffic with evidence