Chapter 06: Network Policies (Production Isolation)

Incident Hook

A debug pod in develop reaches internal services it should never touch. No exploit sophistication is needed, only open east-west traffic. When incident starts, responders cannot quickly prove or limit blast radius. Network policies turn this into an auditable allowlist model.

Observed Symptoms

What the team sees first:

a pod in develop can connect to services outside its intended boundary
responders cannot answer quickly what is reachable and what is not
containment feels manual because the network has no default-deny baseline

The issue is not only one bad connection. It is the absence of a trustworthy traffic model.

Confusion Phase

Without policies, every connectivity question becomes investigative work.

The team now has to discover:

which paths are legitimately required
which paths are accidental exposure
how to contain the pod without breaking the namespace blindly

Why This Chapter Exists

Without network isolation, one compromised pod can move laterally across environments. This chapter introduces a safe baseline:

default deny
explicit allow rules
DNS and ingress paths opened intentionally

What AI Would Propose (Brave Junior)

“Skip policies for now to avoid breaking traffic.”
“We can secure networking later after release.”

Why this sounds reasonable:

avoids immediate traffic risk
seems faster during release pressure

Why This Is Dangerous

Flat networking means high lateral-movement risk.
Production and non-production boundaries become weak.
Incidents are harder to contain under pressure.

Investigation

Start with the boundary, not with ad-hoc firewall guesses.

Safe investigation sequence:

list the source pod, target service, namespace, and port involved
prove what traffic is currently open
define the minimum required paths: DNS, ingress, and exact egress needs
test one allow rule at a time against the default-deny baseline

Containment

Containment narrows traffic fast:

apply namespace default deny
add back DNS first
add ingress path second
allow only the exact egress the workload truly needs

The goal is not “network works somehow.” The goal is “network is explainable.”

Guardrails That Stop It

Start from default deny in target namespace.
Add minimum allow rules one by one with verification.
Keep policy changes isolated from application changes.
Keep rollback manifest ready before applying restrictive policies.

Common AI Trap

AI often suggests broad allow rules to “get traffic working”:

0.0.0.0/0 egress
namespace-wide allow-all policy
temporary wildcard selectors

Do not apply these shortcuts. Fix exact source/destination/path requirements instead.

Investigation Snapshots

Here is the backend allow policy used in the SafeOps system to permit only the ingress path the workload actually needs.

Backend allow policy

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-backend-ingress
spec:
  podSelector:
    matchExpressions:
      - key: app
        operator: In
        values: [backend, backend-primary]
  policyTypes:
    - Ingress
  ingress:
    - from:
        - podSelector:
            matchExpressions:
              - key: app
                operator: In
                values: [frontend, frontend-primary]
      ports:
        - protocol: TCP
          port: 8080
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: traefik
      ports:
        - protocol: TCP
          port: 8080

Here is the baseline policy pack that gets promoted across environments.

Network policy baseline

Show the network policy baseline

flux/infrastructure/network-policies/base/allow-backend-egress-https.yaml
flux/infrastructure/network-policies/base/allow-backend-egress-postgres.yaml
flux/infrastructure/network-policies/base/allow-backend-ingress.yaml
flux/infrastructure/network-policies/base/allow-backend-metrics-from-observability.yaml
flux/infrastructure/network-policies/base/allow-dns-egress.yaml
flux/infrastructure/network-policies/base/allow-frontend-egress-backend.yaml
flux/infrastructure/network-policies/base/allow-frontend-ingress.yaml
flux/infrastructure/network-policies/base/allow-postgres-egress-apiserver.yaml
flux/infrastructure/network-policies/base/allow-postgres-egress-https.yaml
flux/infrastructure/network-policies/base/allow-postgres-ingress-backend.yaml
flux/infrastructure/network-policies/base/allow-postgres-ingress-cnpg-operator.yaml
flux/infrastructure/network-policies/base/default-deny-all.yaml
flux/infrastructure/network-policies/base/kustomization.yaml

System Context

This chapter creates the runtime isolation that later lessons rely on.

It connects directly to:

Chapter 07, where workload hardening limits what an attacker can do after shell access
Chapter 10, where incident response depends on clean service boundaries
Chapter 12, where drills should fail inside bounded scope instead of spreading silently

Safe Workflow (Step-by-Step)

Start from namespace default-deny policy in develop.
Add minimal allow rules in order:
- DNS first
- ingress path second
- required egress last
Test each allow rule before adding the next one.
Run blocked-traffic triage for failures:
- DNS resolution
- namespace/pod labels
- egress target and policy selector match
Reject “allow all” shortcuts even for temporary fixes; patch specific policy instead.
Promote policy changes environment by environment with evidence.

Blocked Traffic Triage Playbook

When traffic is blocked:

Check DNS resolution from source pod.
Confirm source and destination labels match policy selectors.
Verify namespace labels used by namespaceSelector.
Validate port/protocol correctness in policy rules.
Confirm egress destination (service vs IP) matches allowed targets.
Re-test with one rule change at a time and capture evidence.

Lab Files

lab.md
quiz.md

Done When

learner can apply default deny without losing control of the environment
learner can allow only required DNS + ingress traffic
learner can debug and explain blocked traffic with evidence

Estimated Time

Prerequisites

Source Code References

What You Will Produce