Core Track Guardrails-first chapter in core learning path.

Estimated Time

  • Reading: 20-25 min
  • Lab: 45-60 min
  • Quiz: 10-15 min

Prerequisites

Source Code References

  • develop/ Members
  • gitops-workflow.md Members
  • image-automation/ Members
  • production/ Members

Sign in to view source code.

What You Will Produce

A reproducible lab result plus quiz verification and incident-safe operating evidence.

Chapter 04: GitOps & Version Promotion

Incident Hook

A team rebuilds “the same” code for production during incident pressure. The binary differs from staging due to dependency drift and build-time variance. Rollback is confusing because the promoted artifact is not the one that was tested. Time is lost proving artifact lineage instead of restoring service.

Observed Symptoms

What the team sees first:

  • production is running a digest different from the one validated in staging
  • the Git history sounds correct, but the artifact identity does not match
  • rollback discussion turns into a trust discussion

The incident is not only about the symptom. It is about losing artifact certainty at the worst possible moment.

Confusion Phase

The team now has multiple candidates for “the right image”:

  • the last known-good production image
  • the staging image that was supposed to be promoted
  • the rebuilt production image that actually deployed

That ambiguity is what immutable promotion is supposed to prevent.

Why This Chapter Exists

Production safety depends on controlled promotion, not ad-hoc rebuilds. This chapter defines one deployment model:

  • develop deploys develop-* images
  • staging deploys staging-* images
  • production deploys production-* images from explicit promotion

What AI Would Propose (Brave Junior)

  • “Just rebuild from main and deploy to production now.”
  • “Use mutable latest tag for speed.”

Why this sounds reasonable:

  • fast and simple under pressure
  • fewer manual steps

Why This Is Dangerous

  • Rebuild breaks artifact immutability.
  • Mutable tags destroy auditability.
  • Incident response becomes guesswork across envs.

Investigation

Start by proving identity, not by rebuilding again.

Safe investigation sequence:

  1. compare staging and production digests directly
  2. inspect the Git evidence around the promotion commit
  3. confirm what ImagePolicy selected and what ImageUpdateAutomation wrote back to Git
  4. determine whether the deployment was a real promotion or a new build wearing a familiar name

Containment

Containment restores one trustworthy artifact path:

  1. revert to the last known-good production promotion commit
  2. let Flux reconcile the previous digest
  3. verify the deployed workload matches the intended immutable artifact
  4. re-run promotion only after artifact lineage is clear again

Guardrails That Stop It

  • Promotion without rebuild: staging-* is retagged to production-*.
  • Immutable env/version tags are required.
  • Flux image automation writes all image updates to Git.
  • GitOps-first rollback via commit revert.
  • Pre-commit branch/history hooks prevent risky Git operations before promotion PRs:
    • scripts/pre-commit-master-check.sh
    • scripts/prevent-amend-after-push.sh
  • Pre-commit manifest hook validates local Flux renders before promotion PRs:
    • scripts/flux-kustomize-validate.sh

Immutable Artifact Identity Rule

Promotion must reference exact artifact identity from tested environment:

  • immutable tag pattern (production-vX.Y.Z-<sha>-<ts>) and/or
  • digest-pinned image (image@sha256:<digest>).

No rebuild is allowed between tested and promoted artifact.

Investigation Snapshots

Here is the GitOps workflow guide used in the SafeOps system to keep promotion and rollback evidence explicit.

GitOps workflow guide

Show the GitOps workflow guide
## Overview

As of February 16, 2026, the active deployment model is:

- **Develop** namespace auto-updated from env-tagged `develop-*` images.
- **Staging** namespace auto-updated from env-tagged `staging-*` images.
- **Production** namespace auto-updated from env-tagged `production-*` images created by manual promotion workflows.

Flux sync source:
- Git repository branch: `main`
- Path: `./flux/bootstrap/flux-system`

## Actual Image Tagging Strategy

Backend and frontend build workflows publish multiple tags per build:

- Environment alias: `develop` or `staging`
- Immutable env/version tag: `<env>-v<major>.<minor>.<patch>-<short_sha>-<unix_ts>`
- Commit tag: `<short_sha>`

Examples:
- `develop-v0.0.1-a1b2c3d-1738860000`
- `staging-v0.0.1-a1b2c3d-1738860123`
- `production-v0.0.1-a1b2c3d-1738861000` (from promotion workflow)

Production promotion workflows also maintain alias tag `production`.

## CI/CD to Flux Flow

### 1. Build (develop branch)
- Trigger: push to `develop` in service repos (`backend` or `frontend`).
- Workflow builds and pushes `develop-*` tags to GHCR.
- Flux `ImagePolicy` in namespace `develop` selects latest matching tag by extracted timestamp.
- Flux `ImageUpdateAutomation` commits setter updates into this repo (`main`).
- Flux applies the new image tag to `develop`.

### 2. Build (main branch)
- Trigger: push to `main` in service repos.
- Workflow builds and pushes `staging-*` tags to GHCR.
- Flux `ImagePolicy` in namespace `staging` selects latest matching tag.
- Flux writes the updated tag to Git and reconciles `staging`.

### 3. Promotion to production
- Trigger: manual `workflow_dispatch` in service repo (`promote-production.yml`).
- Workflow chooses a `staging-*` tag (explicit input or latest), then retags to:
  - `production`
  - `production-v<major>.<minor>.<patch>-<short_sha>-<unix_ts>`
- Flux `ImagePolicy` in namespace `production` matches `production-*` and deploys automatically.
- The promotion workflow also creates/publishes GitHub Release metadata and bumps next version tag.

## Flux Objects Used (Current State)

The active implementation uses Flux Image Automation (Git write-back), not ResourceSet runtime mutation.

- `ImageRepository` objects in `flux-system`:
  - `flux/bootstrap/infrastructure/image-automation/backend-image-repo.yaml`
  - `flux/bootstrap/infrastructure/image-automation/frontend-image-repo.yaml`
- `ImagePolicy` objects per env:
  - backend: `flux/apps/backend/develop|staging|production/image-policy.yaml`
  - frontend: `flux/apps/frontend/overlays/develop|staging|production/image-policy.yaml`
- `ImageUpdateAutomation` objects per env:
  - backend: `flux/apps/backend/develop|staging|production/image-automation.yaml`
  - frontend: `flux/apps/frontend/overlays/develop|staging|production/image-automation.yaml`
- `GitRepository` source for write-back:
  - `flux/bootstrap/infrastructure/image-automation/git-repository.yaml`

Note: ResourceSet examples are currently commented out in `flux/bootstrap/apps/*`.

## Regex Policies in Use

Backend and frontend use the same tag filters per environment:

- develop: `^develop-v[0-9]+\.[0-9]+\.[0-9]+-[a-f0-9]+-(?P<ts>[0-9]+)$`
- staging: `^staging-v[0-9]+\.[0-9]+\.[0-9]+-[a-f0-9]+-(?P<ts>[0-9]+)$`
- production: `^production-v[0-9]+\.[0-9]+\.[0-9]+-[a-f0-9]+-(?P<ts>[0-9]+)$`

Policies extract `ts` and choose the latest numerically.

## Deployment Verification

```bash
## Rollback Paths

Preferred rollback is GitOps-first:

1. Revert the Flux bot commit in this repository (`main`) that bumped the image tag.
2. Let Flux reconcile the reverted manifest.

Emergency rollback can use `kubectl rollout undo`, but that may drift from Git and should be reconciled back via Git immediately after.

## Troubleshooting

```bash
## Security Notes

1. Use least-privilege credentials for Flux Git and registry access.
2. Keep network isolation between `develop`, `staging`, and `production`.
3. Keep auditability: all image changes should be traceable through Git commits and workflow runs.

## Additional Resources

- [Flux Documentation](https://fluxcd.io/flux/)
- [FluxCD Image Automation](https://fluxcd.io/flux/guides/image-update/)
- [GitHub Actions Documentation](https://docs.github.com/en/actions)
- [Kustomize Documentation](https://kubectl.docs.kubernetes.io/references/kustomize/)

Here is the image automation layout that turns registry changes into auditable Git updates.

Image automation layout

Show the image automation objects
  • flux/bootstrap/infrastructure/image-automation/backend-image-repo.yaml
  • flux/bootstrap/infrastructure/image-automation/frontend-image-repo.yaml
  • flux/bootstrap/infrastructure/image-automation/git-repository.yaml
  • flux/bootstrap/infrastructure/image-automation/k8s-ai-monitor-image-repo.yaml
  • flux/bootstrap/infrastructure/image-automation/kustomization.yaml

System Context

This chapter gives the runtime a trustworthy identity model.

It connects directly to:

  • Chapter 05, where CI and review guardrails protect the promotion path
  • Chapter 10, where incident response depends on knowing exactly what artifact is live
  • Chapter 15, where supply-chain trust extends promotion discipline into signatures and attestations

Deployment Model

  1. Build on service develop branch pushes develop-* image tags.
  2. Build on service main branch pushes staging-* image tags.
  3. Manual promotion workflow retags selected staging-* image to:
  • production
  • production-v<major>.<minor>.<patch>-<short_sha>-<unix_ts>
  1. Flux ImagePolicy selects latest env-matching immutable tag.
  2. Flux ImageUpdateAutomation commits updated tags to Git and reconciles.

Safe Workflow (Step-by-Step)

  1. Confirm tested artifact in staging and capture digest/tag evidence.
  2. Promote artifact identity only (no rebuild, no mutable tag rewrite).
  3. Open promotion PR with immutable target tag and review checklist.
  4. Verify Flux evidence after merge:
    • reconcile status
    • updated ImagePolicy/automation commit
    • deployed image digest in target environment
  5. If symptoms appear, rollback via Git revert and confirm old digest is restored.

Promotion Evidence Checklist

For each promotion, collect:

  • Flux reconcile success for target Kustomization/HelmRelease.
  • image policy/update evidence showing selected immutable tag.
  • deployed workload image reference/digest in target namespace.
  • rollback commit reference prepared before release window.

Image Automation Pipeline

Flux Image Automation removes manual image tag updates from the promotion workflow. Three resources work together to watch, select, and commit image updates.

1. ImageRepository

Watches a container registry for new tags at a regular interval.

  • Targets: ghcr.io/ldbl/backend, ghcr.io/ldbl/frontend
  • Scan interval: 1 minute
  • Authentication: GHCR credentials from Secret

The ImageRepository produces a list of all available tags for each image.

2. ImagePolicy

Filters and selects the correct tag for each environment using regex patterns.

Tag pattern: ^{branch}-v[0-9]+\.[0-9]+\.[0-9]+-[a-f0-9]+-(?P<ts>[0-9]+)$

  • {branch} matches the environment prefix (develop, staging, production)
  • (?P<ts>...) captures the Unix timestamp for ordering
  • Policy selects the tag with the highest timestamp (latest build)

This pattern prevents cross-environment promotion accidents: a develop-* tag can never be selected by the production policy.

3. ImageUpdateAutomation

Commits the selected tag back to the Git repository.

  • Uses a dedicated GitRepository source with push credentials
  • Commits updated image references in deployment manifests
  • Flux then reconciles the new commit, completing the loop

Why This Matters

  • No manual image tag updates: Automation eliminates a common source of human error.
  • Git history shows every promotion: Every image change is a Git commit with full audit trail.
  • Tag patterns prevent accidents: Environment-scoped regex makes cross-env promotion impossible.
  • Timestamp ordering ensures latest wins: No ambiguity about which build is current.

SafeOps Automation Snapshot

Here is the production image policy path used in the SafeOps system to keep promotion selection inside the environment-specific automation lane.

Production image policy path

Show the production overlay layout
  • flux/apps/backend/production/hpa.yaml
  • flux/apps/backend/production/image-automation.yaml
  • flux/apps/backend/production/image-policy.yaml
  • flux/apps/backend/production/kustomization.yaml
  • flux/apps/backend/production/pdb.yaml

Lab Files

  • lab.md
  • quiz.md

Done When

  • learner can explain “promotion instead of rebuild”
  • learner can verify Flux image automation across all three environments
  • learner can perform and explain GitOps-first rollback

Hands-On Materials

Labs, quizzes, and runbooks — available to course members.

  • Lab: Version Promotion and Rollback with Flux GitOps Members
  • Quiz: Chapter 04 (GitOps & Version Promotion) Members