Guardrails That Stop It
- Mandatory Pre-Migration Backup: No migration can run without a fresh backup or PITR checkpoint.
- Independent Rollback Plans: Every PR with a migration must include a documented data rollback strategy.
- Expand-Contract Enforcement: Breaking changes must be split into two releases (Add new, then later remove old).
- Non-Prod Validation: Migrations must be tested against a recent clone of production data in
staging.
Safe Migration Strategy (Expand-Contract)
To avoid “Point-of-No-Return” incidents, we follow the Expand-Contract pattern:
- Expand: Add new columns or tables. Update code to write to both old and new locations.
- Migrate: Move existing data from old to new structures.
- Contract: Update code to read only from new locations. Once stable, remove old structures in a separate release.
Point-in-Time Recovery (PITR) with CNPG
We use CloudNativePG’s PITR capability to restore to a specific timestamp:
- Target: The exact second before the migration job started.
- Mechanism: CNPG creates a new cluster from the WAL logs in S3/R2.
CloudNativePG cluster baseline
Show the cluster baseline
flux/infrastructure/data/cnpg-clusters/develop/cluster.yamlflux/infrastructure/data/cnpg-clusters/develop/kustomization.yamlflux/infrastructure/data/cnpg-clusters/develop/scheduled-backup.yamlflux/infrastructure/data/cnpg-clusters/production/cluster.yamlflux/infrastructure/data/cnpg-clusters/production/kustomization.yamlflux/infrastructure/data/cnpg-clusters/production/postgres-app-secret.yamlflux/infrastructure/data/cnpg-clusters/production/scheduled-backup.yamlflux/infrastructure/data/cnpg-clusters/staging/cluster.yamlflux/infrastructure/data/cnpg-clusters/staging/kustomization.yamlflux/infrastructure/data/cnpg-clusters/staging/scheduled-backup.yaml
Safe Workflow (Step-by-Step)
- Verify Backup: Confirm the latest successful backup and PITR status.
- Trigger Manual Backup: Take a fresh backup immediately before the migration.
- Deploy Migration: Use a Kubernetes
Jobor an init container to run the migration. - Monitor Health: Check app logs and metrics for database-related errors.
- Rollback (if needed): Revert code in Git and restore the database to the pre-migration timestamp.
- Verify Recovery: Confirm the application is healthy and data is consistent.
This builds on: Admission policies (Chapter 16) — migration gates use the same policy model. This enables: Advanced track complete.