Chapter 05: CI/CD & Developer Guardrails

Incident Hook

A developer pushes directly to main, skipping validation. Terraform applies an untested change. An unreviewed application or infrastructure change reaches production. Each failure happened because one guardrail layer was missing or bypassed.

Observed Symptoms

What the team sees first:

there is no normal PR discussion for the change
no approved plan artifact exists for the infrastructure mutation
responders must reconstruct intent after the change already landed

The first operational problem is missing evidence, not missing tooling.

Confusion Phase

The workflow now feels partially intact because some automation still ran. That creates false confidence.

The real question becomes:

which guardrail failed first
and which missing layer allowed the later layers to become insufficient

Why This Chapter Exists

CI/CD pipelines are where code becomes infrastructure. Without guardrails at every stage, a single unvalidated change can bypass all cluster-side protections. This chapter defines the layered defense model: local hooks, CI validation, approval gates, and AI-assisted review.

What AI Would Propose (Brave Junior)

“Skip pre-commit hooks locally, CI will catch it.”
“Apply Terraform directly, we already know what it does.”
“Merge without review, the change is small.”

Why this sounds reasonable:

faster iteration in the moment
fewer steps to production
small changes feel safe

Why This Is Dangerous

CI cannot catch what was never pushed (secret committed and force-pushed away still leaks).
Direct apply without plan review removes the last safe checkpoint.
Small unreviewed changes accumulate into unauditable drift.

Investigation

Treat the path itself as part of the incident.

Safe investigation sequence:

verify whether local hooks ran or were bypassed
inspect the CI path for plan, approval, and apply evidence
confirm whether review and protected-branch rules were enforced
identify the first missing checkpoint that made the later failure possible

Containment

Containment means restoring the normal path before the next change:

revert or replay the change through the approved workflow
regenerate the reviewed plan if infrastructure was touched
reinstall or re-enable local hooks and merge protections
confirm the next change must pass all four layers again

Guardrails That Stop It

Pre-commit hooks block risky operations before code leaves the workstation.
CI pipelines enforce validation, scanning, and approval gates.
AI-assisted review catches patterns humans miss under pressure.
The advanced track later adds cluster-side admission policies as the final boundary.

System Context

This chapter ties together the execution path built earlier in the course.

It reinforces:

Chapter 02 plan-before-apply discipline
Chapter 03 secret blocking before Git history is polluted
Chapter 04 promotion evidence through Git-visible, reviewable changes

Core Concepts

1. Pre-commit Hooks (First Guardrail Layer)

Pre-commit hooks run before code leaves the developer workstation. They are the cheapest guardrail: fast feedback, zero CI cost, immediate correction.

Branch protection:

master-branch-check.sh blocks direct commits to main/master. All changes must go through feature branches and PRs.

History safety:

prevent-amend-after-push.sh blocks amending commits that have already been pushed. This prevents rewriting shared history.

Secret blocking:

block-secrets.sh pattern-matches on dangerous file types: kubeconfig, .key, .pem, .env, and credential patterns. Catches secrets before they enter Git history.

Flux manifest validation:

flux-kustomize-validate.sh runs a 3-stage validation pipeline:
1. YAML syntax check
2. kustomize build to verify overlay resolution
3. kubeconform with CRD schemas for structural validation

Terraform validation:

terraform fmt check ensures consistent formatting
terraform validate catches configuration errors
checkov security scan flags misconfigurations

External linters:

shellcheck for shell script quality
yamllint for YAML formatting consistency

2. GitHub Actions Pipeline Design

The CI pipeline enforces what local hooks cannot guarantee (because developers can skip hooks).

Plan-Approve-Apply pattern (from terraform-hcloud.yml):

Plan Job → Upload Artifact → Approval Gate → Download Artifact → Apply Job

Key design decisions:

Concurrency control: cancel-in-progress: false for apply jobs. Never cancel a running infrastructure mutation.
Artifact passing: tfplan uploaded with 1-day retention. Apply job downloads the exact reviewed plan.
Environment protection: GitHub environment with required reviewers and 60-minute timeout.
Secret management: Infrastructure credentials passed via TF_VAR_* environment variables from GitHub Secrets.

Destroy workflow (from terraform-hcloud-destroy.yml):

Destruction requires elevated confirmation:

Manual trigger (workflow_dispatch) with confirmation string input (“DESTROY”)
Multi-approver requirement
Makefile-delegated destroy sequence: Flux/K8s cleanup before terraform destroy

3. AI-Assisted Code Review (CodeRabbit)

CodeRabbit provides automated review as a safety net, not a replacement for human review.

Configuration highlights:

Path-specific rules: different review depth for infrastructure vs application code
Profile: “chill” — non-aggressive tone, focuses on real issues
Security tools integration: gitleaks (secrets), semgrep (code patterns), checkov (IaC), hadolint (Dockerfiles), yamllint (YAML), actionlint (GitHub Actions)
KISS principle enforcement: flags unnecessary complexity

4. Guardrails Layering Model

Local (pre-commit) → CI (GitHub Actions) → Review (CodeRabbit) → Cluster boundary

Each layer catches what the previous layer missed:

Pre-commit catches developer mistakes immediately
CI catches bypassed hooks and validates against real schemas
CodeRabbit catches patterns and anti-patterns across the full PR
Later cluster-side policies enforce invariants even if the pipeline path is bypassed

No single layer is sufficient alone. Defense in depth means every layer assumes the previous layer failed.

Safe Workflow (Step-by-Step)

Install pre-commit hooks: pre-commit install --install-hooks
Commit triggers local validation (branch check, secret scan, manifest validation)
Push triggers CI pipeline (plan, validate, security scan)
PR creation triggers CodeRabbit review
Merge to main triggers apply with approval gate
Flux reconciles to cluster with admission policies as last gate

Investigation Snapshots

Here is the pre-commit baseline used in the SafeOps system to turn workstation discipline into executable policy.

Pre-commit baseline

Show the pre-commit configuration

default_install_hook_types:
  - pre-commit
  - pre-push
  - pre-merge-commit
  - prepare-commit-msg

repos:
  - repo: local
    hooks:
      - id: master-branch-check
        name: Protected branch guard
        entry: scripts/pre-commit-master-check.sh
        language: script
        always_run: true
        pass_filenames: false
        stages: [pre-commit, pre-push, pre-merge-commit]
        args:
          - --protected=master
          - --protected=main

      - id: prevent-amend-after-push
        name: Prevent amending pushed commits
        entry: scripts/prevent-amend-after-push.sh
        language: script
        always_run: true
        pass_filenames: false
        stages: [prepare-commit-msg]

  - repo: local
    hooks:
      - id: flux-kustomize-validate
        name: Flux kustomize validate
        entry: scripts/flux-kustomize-validate.sh
        language: script
        files: ^flux/.*\.ya?ml$
        pass_filenames: true
        require_serial: true
        stages: [pre-commit]

      - id: terraform-fmt
        name: Terraform format check
        entry: terraform fmt -recursive -diff -check
        language: system
        files: \.tf$
        pass_filenames: false
        stages: [pre-commit]

      - id: terraform-validate
        name: Terraform validate
        entry: scripts/terraform-validate.sh
        language: script
        files: \.(tf|tfvars)$
        pass_filenames: false
        require_serial: true
        stages: [pre-commit]

      - id: terraform-security
        name: Terraform security scan
        entry: scripts/terraform-security.sh
        language: script
        files: \.(tf|tfvars)$
        pass_filenames: false
        require_serial: true
        stages: [pre-commit]

  - repo: local
    hooks:
      - id: no-secrets
        name: Block sensitive files
        entry: scripts/block-secrets.sh
        language: script
        files: (kubeconfig|\.key$|\.pem$|credentials|\.env$)
        stages: [pre-commit]

  - repo: https://github.com/koalaman/shellcheck-precommit
    rev: v0.10.0
    hooks:
      - id: shellcheck
        files: \.sh$
        args: [--severity=warning]
        stages: [pre-commit]

  - repo: https://github.com/adrienverge/yamllint
    rev: v1.35.1
    hooks:
      - id: yamllint
        files: \.ya?ml$
        args: [-d, relaxed]
        stages: [pre-commit]

Here is the Terraform workflow used in the SafeOps system for plan, approval, and apply separation.

Plan-approve-apply workflow

Show the Terraform workflow

name: Terraform - Hetzner

on:
  pull_request:
    paths:
      - "infra/terraform/hcloud_cluster/**"
      - "flux/**"
      - ".github/workflows/terraform-hcloud*.yml"
  push:
    branches: [main]
    paths:
      - "infra/terraform/hcloud_cluster/**"
      - "flux/**"
      - ".github/workflows/terraform-hcloud*.yml"

concurrency:
  group: terraform-hcloud
  cancel-in-progress: false

permissions:
  contents: read
  issues: write

jobs:
  plan:
    runs-on: ubuntu-latest
    outputs:
      has_changes: ${{ steps.plan.outputs.has_changes }}

    defaults:
      run:
        working-directory: infra/terraform/hcloud_cluster

    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Setup kubectl
        uses: azure/setup-kubectl@v4
        with:
          version: "v1.34.1"

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: "1.14.5"
          terraform_wrapper: false

      - name: Terraform fmt
        run: terraform fmt -check -recursive

      - name: Terraform init
        env:
          AWS_ACCESS_KEY_ID: ${{ secrets.R2_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.R2_SECRET_ACCESS_KEY }}
        run: terraform init -input=false

      - name: Terraform validate
        run: terraform validate

      - name: Terraform plan
        id: plan
        env:
          AWS_ACCESS_KEY_ID: ${{ secrets.R2_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.R2_SECRET_ACCESS_KEY }}
          TF_VAR_hcloud_token: ${{ secrets.HCLOUD_TOKEN }}
          TF_VAR_ssh_public_key: ${{ secrets.HCLOUD_SSH_PUBLIC_KEY }}
          TF_VAR_ssh_private_key: ${{ secrets.HCLOUD_SSH_PRIVATE_KEY }}
          TF_VAR_flux_git_repository_url: https://github.com/${{ github.repository }}.git
          TF_VAR_flux_git_repository_branch: main
          TF_VAR_flux_kustomization_path: ./flux/bootstrap/flux-system
          TF_VAR_flux_git_token: ${{ secrets.FLUX_GIT_TOKEN }}
          TF_VAR_enable_ghcr: "true"
          TF_VAR_ghcr_username: ${{ secrets.GHCR_USERNAME }}
          TF_VAR_ghcr_token: ${{ secrets.GHCR_TOKEN }}
          TF_VAR_sops_age_key: ${{ secrets.SOPS_AGE_KEY }}
          TF_VAR_backup_s3_access_key_id: ${{ secrets.R2_ACCESS_KEY_ID }}
          TF_VAR_backup_s3_secret_access_key: ${{ secrets.R2_SECRET_ACCESS_KEY }}
          TF_VAR_backup_s3_bucket: ${{ secrets.R2_BUCKET }}
          TF_VAR_backup_s3_endpoint: ${{ secrets.R2_ENDPOINT }}
          TF_VAR_backup_s3_region: ${{ secrets.R2_REGION }}
        run: |
          set +e
          set -o pipefail
          terraform plan -input=false -lock-timeout=5m -no-color -detailed-exitcode -out=tfplan 2>&1 | tee plan.txt
          exit_code=${PIPESTATUS[0]}
          set -e
          if [ "$exit_code" -eq 1 ]; then
            echo "Terraform plan failed."
            exit 1
          fi
          if [ "$exit_code" -eq 0 ]; then
            echo "has_changes=false" >> "$GITHUB_OUTPUT"
          else
            echo "has_changes=true" >> "$GITHUB_OUTPUT"
          fi

      - name: Upload tfplan artifact
        if: github.event_name == 'push' && steps.plan.outputs.has_changes == 'true'
        uses: actions/upload-artifact@v4
        with:
          name: terraform-hcloud-tfplan
          path: |
            infra/terraform/hcloud_cluster/tfplan
            infra/terraform/hcloud_cluster/plan.txt
          retention-days: 1

  approval:
    runs-on: ubuntu-latest
    needs: plan
    if: github.event_name == 'push' && needs.plan.outputs.has_changes == 'true'
    timeout-minutes: 60
    steps:
      - name: Manual approval gate
        uses: pavlospt/manual-approval@v2
        with:
          secret: ${{ github.token }}
          approvers: ldbl
          minimum-approvals: 1
          issue-title: "Terraform apply — ${{ github.sha }}"
          issue-body: |
            Terraform plan detected infrastructure changes on `main`.

            **Commit:** ${{ github.sha }}
            **Run:** ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}

            Approve or deny this apply.
          exclude-workflow-initiator-as-approver: false

  apply:
    runs-on: ubuntu-latest
    needs: [plan, approval]
    if: github.event_name == 'push' && needs.plan.outputs.has_changes == 'true' && needs.approval.result == 'success'

    defaults:
      run:
        working-directory: infra/terraform/hcloud_cluster

    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Setup kubectl
        uses: azure/setup-kubectl@v4
        with:
          version: "v1.34.1"

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: "1.14.5"

      - name: Terraform init
        env:
          AWS_ACCESS_KEY_ID: ${{ secrets.R2_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.R2_SECRET_ACCESS_KEY }}
        run: terraform init -input=false

      - name: Download tfplan artifact
        uses: actions/download-artifact@v4
        with:
          name: terraform-hcloud-tfplan
          path: infra/terraform/hcloud_cluster

      - name: Terraform apply
        env:
          AWS_ACCESS_KEY_ID: ${{ secrets.R2_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.R2_SECRET_ACCESS_KEY }}
        run: terraform apply -input=false -lock-timeout=5m tfplan

Anti-Patterns to Avoid

Relying on CI alone without local hooks (slow feedback, wasted CI minutes).
Using --no-verify to skip hooks during “quick fixes.”
Running apply without approval gate, even for “trivial” changes.
Ignoring CodeRabbit findings because “it’s just AI.”
Treating any single layer as the complete guardrail.

Lab Files

lab.md
quiz.md

Done When

learner can explain the guardrails layering model and why each layer exists
learner can install and trigger pre-commit hooks locally
learner can trace the Plan-Approve-Apply pipeline flow
learner can describe how CodeRabbit integrates with the review process
learner can identify what each guardrail layer catches that others miss

Next Chapter

Continue with Chapter 06 (Network Policies).

Estimated Time

Prerequisites

Source Code References

What You Will Produce