Chapter 05: CI/CD & Developer Guardrails
Incident Hook
A developer pushes directly to main, skipping validation. Terraform applies an untested change. An unreviewed application or infrastructure change reaches production. Each failure happened because one guardrail layer was missing or bypassed.
Observed Symptoms
What the team sees first:
- there is no normal PR discussion for the change
- no approved plan artifact exists for the infrastructure mutation
- responders must reconstruct intent after the change already landed
The first operational problem is missing evidence, not missing tooling.
Confusion Phase
The workflow now feels partially intact because some automation still ran. That creates false confidence.
The real question becomes:
- which guardrail failed first
- and which missing layer allowed the later layers to become insufficient
Why This Chapter Exists
CI/CD pipelines are where code becomes infrastructure. Without guardrails at every stage, a single unvalidated change can bypass all cluster-side protections. This chapter defines the layered defense model: local hooks, CI validation, approval gates, and AI-assisted review.
What AI Would Propose (Brave Junior)
- “Skip pre-commit hooks locally, CI will catch it.”
- “Apply Terraform directly, we already know what it does.”
- “Merge without review, the change is small.”
Why this sounds reasonable:
- faster iteration in the moment
- fewer steps to production
- small changes feel safe
Why This Is Dangerous
- CI cannot catch what was never pushed (secret committed and force-pushed away still leaks).
- Direct apply without plan review removes the last safe checkpoint.
- Small unreviewed changes accumulate into unauditable drift.
Investigation
Treat the path itself as part of the incident.
Safe investigation sequence:
- verify whether local hooks ran or were bypassed
- inspect the CI path for plan, approval, and apply evidence
- confirm whether review and protected-branch rules were enforced
- identify the first missing checkpoint that made the later failure possible
Containment
Containment means restoring the normal path before the next change:
- revert or replay the change through the approved workflow
- regenerate the reviewed plan if infrastructure was touched
- reinstall or re-enable local hooks and merge protections
- confirm the next change must pass all four layers again
Guardrails That Stop It
- Pre-commit hooks block risky operations before code leaves the workstation.
- CI pipelines enforce validation, scanning, and approval gates.
- AI-assisted review catches patterns humans miss under pressure.
- The advanced track later adds cluster-side admission policies as the final boundary.
System Context
This chapter ties together the execution path built earlier in the course.
It reinforces:
- Chapter 02 plan-before-apply discipline
- Chapter 03 secret blocking before Git history is polluted
- Chapter 04 promotion evidence through Git-visible, reviewable changes
Core Concepts
1. Pre-commit Hooks (First Guardrail Layer)
Pre-commit hooks run before code leaves the developer workstation. They are the cheapest guardrail: fast feedback, zero CI cost, immediate correction.
Branch protection:
master-branch-check.shblocks direct commits tomain/master. All changes must go through feature branches and PRs.
History safety:
prevent-amend-after-push.shblocks amending commits that have already been pushed. This prevents rewriting shared history.
Secret blocking:
block-secrets.shpattern-matches on dangerous file types: kubeconfig,.key,.pem,.env, and credential patterns. Catches secrets before they enter Git history.
Flux manifest validation:
flux-kustomize-validate.shruns a 3-stage validation pipeline:- YAML syntax check
kustomize buildto verify overlay resolutionkubeconformwith CRD schemas for structural validation
Terraform validation:
terraform fmtcheck ensures consistent formattingterraform validatecatches configuration errorscheckovsecurity scan flags misconfigurations
External linters:
shellcheckfor shell script qualityyamllintfor YAML formatting consistency
2. GitHub Actions Pipeline Design
The CI pipeline enforces what local hooks cannot guarantee (because developers can skip hooks).
Plan-Approve-Apply pattern (from terraform-hcloud.yml):
Plan Job → Upload Artifact → Approval Gate → Download Artifact → Apply Job
Key design decisions:
- Concurrency control:
cancel-in-progress: falsefor apply jobs. Never cancel a running infrastructure mutation. - Artifact passing:
tfplanuploaded with 1-day retention. Apply job downloads the exact reviewed plan. - Environment protection: GitHub environment with required reviewers and 60-minute timeout.
- Secret management: Infrastructure credentials passed via
TF_VAR_*environment variables from GitHub Secrets.
Destroy workflow (from terraform-hcloud-destroy.yml):
Destruction requires elevated confirmation:
- Manual trigger (
workflow_dispatch) with confirmation string input (“DESTROY”) - Multi-approver requirement
- Makefile-delegated destroy sequence: Flux/K8s cleanup before
terraform destroy
3. AI-Assisted Code Review (CodeRabbit)
CodeRabbit provides automated review as a safety net, not a replacement for human review.
Configuration highlights:
- Path-specific rules: different review depth for infrastructure vs application code
- Profile: “chill” — non-aggressive tone, focuses on real issues
- Security tools integration: gitleaks (secrets), semgrep (code patterns), checkov (IaC), hadolint (Dockerfiles), yamllint (YAML), actionlint (GitHub Actions)
- KISS principle enforcement: flags unnecessary complexity
4. Guardrails Layering Model
Local (pre-commit) → CI (GitHub Actions) → Review (CodeRabbit) → Cluster boundary
Each layer catches what the previous layer missed:
- Pre-commit catches developer mistakes immediately
- CI catches bypassed hooks and validates against real schemas
- CodeRabbit catches patterns and anti-patterns across the full PR
- Later cluster-side policies enforce invariants even if the pipeline path is bypassed
No single layer is sufficient alone. Defense in depth means every layer assumes the previous layer failed.
Safe Workflow (Step-by-Step)
- Install pre-commit hooks:
pre-commit install --install-hooks - Commit triggers local validation (branch check, secret scan, manifest validation)
- Push triggers CI pipeline (plan, validate, security scan)
- PR creation triggers CodeRabbit review
- Merge to main triggers apply with approval gate
- Flux reconciles to cluster with admission policies as last gate
Investigation Snapshots
Here is the pre-commit baseline used in the SafeOps system to turn workstation discipline into executable policy.
Pre-commit baseline
Show the pre-commit configuration
default_install_hook_types:
- pre-commit
- pre-push
- pre-merge-commit
- prepare-commit-msg
repos:
- repo: local
hooks:
- id: master-branch-check
name: Protected branch guard
entry: scripts/pre-commit-master-check.sh
language: script
always_run: true
pass_filenames: false
stages: [pre-commit, pre-push, pre-merge-commit]
args:
- --protected=master
- --protected=main
- id: prevent-amend-after-push
name: Prevent amending pushed commits
entry: scripts/prevent-amend-after-push.sh
language: script
always_run: true
pass_filenames: false
stages: [prepare-commit-msg]
- repo: local
hooks:
- id: flux-kustomize-validate
name: Flux kustomize validate
entry: scripts/flux-kustomize-validate.sh
language: script
files: ^flux/.*\.ya?ml$
pass_filenames: true
require_serial: true
stages: [pre-commit]
- id: terraform-fmt
name: Terraform format check
entry: terraform fmt -recursive -diff -check
language: system
files: \.tf$
pass_filenames: false
stages: [pre-commit]
- id: terraform-validate
name: Terraform validate
entry: scripts/terraform-validate.sh
language: script
files: \.(tf|tfvars)$
pass_filenames: false
require_serial: true
stages: [pre-commit]
- id: terraform-security
name: Terraform security scan
entry: scripts/terraform-security.sh
language: script
files: \.(tf|tfvars)$
pass_filenames: false
require_serial: true
stages: [pre-commit]
- repo: local
hooks:
- id: no-secrets
name: Block sensitive files
entry: scripts/block-secrets.sh
language: script
files: (kubeconfig|\.key$|\.pem$|credentials|\.env$)
stages: [pre-commit]
- repo: https://github.com/koalaman/shellcheck-precommit
rev: v0.10.0
hooks:
- id: shellcheck
files: \.sh$
args: [--severity=warning]
stages: [pre-commit]
- repo: https://github.com/adrienverge/yamllint
rev: v1.35.1
hooks:
- id: yamllint
files: \.ya?ml$
args: [-d, relaxed]
stages: [pre-commit]
Here is the Terraform workflow used in the SafeOps system for plan, approval, and apply separation.
Plan-approve-apply workflow
Show the Terraform workflow
name: Terraform - Hetzner
on:
pull_request:
paths:
- "infra/terraform/hcloud_cluster/**"
- "flux/**"
- ".github/workflows/terraform-hcloud*.yml"
push:
branches: [main]
paths:
- "infra/terraform/hcloud_cluster/**"
- "flux/**"
- ".github/workflows/terraform-hcloud*.yml"
concurrency:
group: terraform-hcloud
cancel-in-progress: false
permissions:
contents: read
issues: write
jobs:
plan:
runs-on: ubuntu-latest
outputs:
has_changes: ${{ steps.plan.outputs.has_changes }}
defaults:
run:
working-directory: infra/terraform/hcloud_cluster
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Setup kubectl
uses: azure/setup-kubectl@v4
with:
version: "v1.34.1"
- name: Setup Terraform
uses: hashicorp/setup-terraform@v3
with:
terraform_version: "1.14.5"
terraform_wrapper: false
- name: Terraform fmt
run: terraform fmt -check -recursive
- name: Terraform init
env:
AWS_ACCESS_KEY_ID: ${{ secrets.R2_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.R2_SECRET_ACCESS_KEY }}
run: terraform init -input=false
- name: Terraform validate
run: terraform validate
- name: Terraform plan
id: plan
env:
AWS_ACCESS_KEY_ID: ${{ secrets.R2_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.R2_SECRET_ACCESS_KEY }}
TF_VAR_hcloud_token: ${{ secrets.HCLOUD_TOKEN }}
TF_VAR_ssh_public_key: ${{ secrets.HCLOUD_SSH_PUBLIC_KEY }}
TF_VAR_ssh_private_key: ${{ secrets.HCLOUD_SSH_PRIVATE_KEY }}
TF_VAR_flux_git_repository_url: https://github.com/${{ github.repository }}.git
TF_VAR_flux_git_repository_branch: main
TF_VAR_flux_kustomization_path: ./flux/bootstrap/flux-system
TF_VAR_flux_git_token: ${{ secrets.FLUX_GIT_TOKEN }}
TF_VAR_enable_ghcr: "true"
TF_VAR_ghcr_username: ${{ secrets.GHCR_USERNAME }}
TF_VAR_ghcr_token: ${{ secrets.GHCR_TOKEN }}
TF_VAR_sops_age_key: ${{ secrets.SOPS_AGE_KEY }}
TF_VAR_backup_s3_access_key_id: ${{ secrets.R2_ACCESS_KEY_ID }}
TF_VAR_backup_s3_secret_access_key: ${{ secrets.R2_SECRET_ACCESS_KEY }}
TF_VAR_backup_s3_bucket: ${{ secrets.R2_BUCKET }}
TF_VAR_backup_s3_endpoint: ${{ secrets.R2_ENDPOINT }}
TF_VAR_backup_s3_region: ${{ secrets.R2_REGION }}
run: |
set +e
set -o pipefail
terraform plan -input=false -lock-timeout=5m -no-color -detailed-exitcode -out=tfplan 2>&1 | tee plan.txt
exit_code=${PIPESTATUS[0]}
set -e
if [ "$exit_code" -eq 1 ]; then
echo "Terraform plan failed."
exit 1
fi
if [ "$exit_code" -eq 0 ]; then
echo "has_changes=false" >> "$GITHUB_OUTPUT"
else
echo "has_changes=true" >> "$GITHUB_OUTPUT"
fi
- name: Upload tfplan artifact
if: github.event_name == 'push' && steps.plan.outputs.has_changes == 'true'
uses: actions/upload-artifact@v4
with:
name: terraform-hcloud-tfplan
path: |
infra/terraform/hcloud_cluster/tfplan
infra/terraform/hcloud_cluster/plan.txt
retention-days: 1
approval:
runs-on: ubuntu-latest
needs: plan
if: github.event_name == 'push' && needs.plan.outputs.has_changes == 'true'
timeout-minutes: 60
steps:
- name: Manual approval gate
uses: pavlospt/manual-approval@v2
with:
secret: ${{ github.token }}
approvers: ldbl
minimum-approvals: 1
issue-title: "Terraform apply — ${{ github.sha }}"
issue-body: |
Terraform plan detected infrastructure changes on `main`.
**Commit:** ${{ github.sha }}
**Run:** ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
Approve or deny this apply.
exclude-workflow-initiator-as-approver: false
apply:
runs-on: ubuntu-latest
needs: [plan, approval]
if: github.event_name == 'push' && needs.plan.outputs.has_changes == 'true' && needs.approval.result == 'success'
defaults:
run:
working-directory: infra/terraform/hcloud_cluster
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Setup kubectl
uses: azure/setup-kubectl@v4
with:
version: "v1.34.1"
- name: Setup Terraform
uses: hashicorp/setup-terraform@v3
with:
terraform_version: "1.14.5"
- name: Terraform init
env:
AWS_ACCESS_KEY_ID: ${{ secrets.R2_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.R2_SECRET_ACCESS_KEY }}
run: terraform init -input=false
- name: Download tfplan artifact
uses: actions/download-artifact@v4
with:
name: terraform-hcloud-tfplan
path: infra/terraform/hcloud_cluster
- name: Terraform apply
env:
AWS_ACCESS_KEY_ID: ${{ secrets.R2_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.R2_SECRET_ACCESS_KEY }}
run: terraform apply -input=false -lock-timeout=5m tfplan
Anti-Patterns to Avoid
- Relying on CI alone without local hooks (slow feedback, wasted CI minutes).
- Using
--no-verifyto skip hooks during “quick fixes.” - Running apply without approval gate, even for “trivial” changes.
- Ignoring CodeRabbit findings because “it’s just AI.”
- Treating any single layer as the complete guardrail.
Lab Files
lab.mdquiz.md
Done When
- learner can explain the guardrails layering model and why each layer exists
- learner can install and trigger pre-commit hooks locally
- learner can trace the Plan-Approve-Apply pipeline flow
- learner can describe how CodeRabbit integrates with the review process
- learner can identify what each guardrail layer catches that others miss
Next Chapter
Continue with Chapter 06 (Network Policies).