Core Track Guardrails-first chapter in core learning path.

Estimated Time

  • Reading: 20-25 min
  • Lab: 45-60 min
  • Quiz: 10-15 min

Prerequisites

Source Code References

  • allow-backend-ingress.yaml Members
  • base/ Members

Sign in to view source code.

What You Will Produce

A reproducible lab result plus quiz verification and incident-safe operating evidence.

Chapter 06: Network Policies (Production Isolation)

Incident Hook

A debug pod in develop reaches internal services it should never touch. No exploit sophistication is needed, only open east-west traffic. When incident starts, responders cannot quickly prove or limit blast radius. Network policies turn this into an auditable allowlist model.

Observed Symptoms

What the team sees first:

  • a pod in develop can connect to services outside its intended boundary
  • responders cannot answer quickly what is reachable and what is not
  • containment feels manual because the network has no default-deny baseline

The issue is not only one bad connection. It is the absence of a trustworthy traffic model.

Confusion Phase

Without policies, every connectivity question becomes investigative work.

The team now has to discover:

  • which paths are legitimately required
  • which paths are accidental exposure
  • how to contain the pod without breaking the namespace blindly

Why This Chapter Exists

Without network isolation, one compromised pod can move laterally across environments. This chapter introduces a safe baseline:

  • default deny
  • explicit allow rules
  • DNS and ingress paths opened intentionally

What AI Would Propose (Brave Junior)

  • “Skip policies for now to avoid breaking traffic.”
  • “We can secure networking later after release.”

Why this sounds reasonable:

  • avoids immediate traffic risk
  • seems faster during release pressure

Why This Is Dangerous

  • Flat networking means high lateral-movement risk.
  • Production and non-production boundaries become weak.
  • Incidents are harder to contain under pressure.

Investigation

Start with the boundary, not with ad-hoc firewall guesses.

Safe investigation sequence:

  1. list the source pod, target service, namespace, and port involved
  2. prove what traffic is currently open
  3. define the minimum required paths: DNS, ingress, and exact egress needs
  4. test one allow rule at a time against the default-deny baseline

Containment

Containment narrows traffic fast:

  1. apply namespace default deny
  2. add back DNS first
  3. add ingress path second
  4. allow only the exact egress the workload truly needs

The goal is not “network works somehow.” The goal is “network is explainable.”

Guardrails That Stop It

  • Start from default deny in target namespace.
  • Add minimum allow rules one by one with verification.
  • Keep policy changes isolated from application changes.
  • Keep rollback manifest ready before applying restrictive policies.

Common AI Trap

AI often suggests broad allow rules to “get traffic working”:

  • 0.0.0.0/0 egress
  • namespace-wide allow-all policy
  • temporary wildcard selectors

Do not apply these shortcuts. Fix exact source/destination/path requirements instead.

Investigation Snapshots

Here is the backend allow policy used in the SafeOps system to permit only the ingress path the workload actually needs.

Backend allow policy

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-backend-ingress
spec:
  podSelector:
    matchExpressions:
      - key: app
        operator: In
        values: [backend, backend-primary]
  policyTypes:
    - Ingress
  ingress:
    - from:
        - podSelector:
            matchExpressions:
              - key: app
                operator: In
                values: [frontend, frontend-primary]
      ports:
        - protocol: TCP
          port: 8080
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: traefik
      ports:
        - protocol: TCP
          port: 8080

Here is the baseline policy pack that gets promoted across environments.

Network policy baseline

Show the network policy baseline
  • flux/infrastructure/network-policies/base/allow-backend-egress-https.yaml
  • flux/infrastructure/network-policies/base/allow-backend-egress-postgres.yaml
  • flux/infrastructure/network-policies/base/allow-backend-ingress.yaml
  • flux/infrastructure/network-policies/base/allow-backend-metrics-from-observability.yaml
  • flux/infrastructure/network-policies/base/allow-dns-egress.yaml
  • flux/infrastructure/network-policies/base/allow-frontend-egress-backend.yaml
  • flux/infrastructure/network-policies/base/allow-frontend-ingress.yaml
  • flux/infrastructure/network-policies/base/allow-postgres-egress-apiserver.yaml
  • flux/infrastructure/network-policies/base/allow-postgres-egress-https.yaml
  • flux/infrastructure/network-policies/base/allow-postgres-ingress-backend.yaml
  • flux/infrastructure/network-policies/base/allow-postgres-ingress-cnpg-operator.yaml
  • flux/infrastructure/network-policies/base/default-deny-all.yaml
  • flux/infrastructure/network-policies/base/kustomization.yaml

System Context

This chapter creates the runtime isolation that later lessons rely on.

It connects directly to:

  • Chapter 07, where workload hardening limits what an attacker can do after shell access
  • Chapter 10, where incident response depends on clean service boundaries
  • Chapter 12, where drills should fail inside bounded scope instead of spreading silently

Safe Workflow (Step-by-Step)

  1. Start from namespace default-deny policy in develop.
  2. Add minimal allow rules in order:
    • DNS first
    • ingress path second
    • required egress last
  3. Test each allow rule before adding the next one.
  4. Run blocked-traffic triage for failures:
    • DNS resolution
    • namespace/pod labels
    • egress target and policy selector match
  5. Reject “allow all” shortcuts even for temporary fixes; patch specific policy instead.
  6. Promote policy changes environment by environment with evidence.

Blocked Traffic Triage Playbook

When traffic is blocked:

  1. Check DNS resolution from source pod.
  2. Confirm source and destination labels match policy selectors.
  3. Verify namespace labels used by namespaceSelector.
  4. Validate port/protocol correctness in policy rules.
  5. Confirm egress destination (service vs IP) matches allowed targets.
  6. Re-test with one rule change at a time and capture evidence.

Lab Files

  • lab.md
  • quiz.md

Done When

  • learner can apply default deny without losing control of the environment
  • learner can allow only required DNS + ingress traffic
  • learner can debug and explain blocked traffic with evidence

Hands-On Materials

Labs, quizzes, and runbooks — available to course members.

  • Lab: Default Deny and Controlled Traffic Allowlist Members
  • Quiz: Chapter 06 (Network Policies) Members