Chapter 16: Rollback and Data Migrations (Advanced)
Why This Chapter Exists
Application rollback is easy only when database state is compatible. Most production rollback failures happen at the boundary between application version and schema version.
This chapter defines a safe migration discipline:
- backward-compatible schema first
- application rollout second
- destructive schema changes last
- explicit rollback windows and feature flag gates
Learning Objectives
By the end of this chapter, learners can:
- explain expand/contract migration strategy
- design rollback-safe deploy sequence for app + schema
- execute a migration incident drill with evidence capture
- define break-glass rules for failed migrations
Course Implementation Scope
- this chapter runs migration workflow drills on CNPG/PostgreSQL targets
- application behavior gating is demonstrated with feature-flag simulation
- the same rollout and rollback sequence applies directly to database-backed login/user flows
The Incident Hook
A release includes application code and schema migration in one step. Migration drops/renames a column used by previous app version. New deployment fails health checks; rollback of application image succeeds, but old app cannot read data anymore. Incident duration expands because “app rollback” alone cannot recover service.
What AI Would Propose (Brave Junior)
- “Apply migration and deploy together in one PR.”
- “If deploy fails, just rollback image tag.”
- “Skip feature flags to reduce complexity.”
Why this sounds reasonable:
- fewer moving parts in one release
- fast visible progress
Why This Is Dangerous
- schema and application coupling creates irreversible rollback paths
- destructive changes remove safety window
- partial rollout can leave mixed-version traffic against incompatible schema
Guardrails That Stop It
- expand/contract strategy only
- migration scripts must be idempotent and reviewed
- app rollout uses feature flags for behavior gating
- rollback plan includes data compatibility checks
- destructive DDL only after verification window and explicit approval
Repository Mapping
- data platform baseline: flux/infrastructure/data/cnpg-clusters
- backup/restore baseline: Chapter 10: Backup & Restore Basics
- promotion baseline: Chapter 04: GitOps & Version Promotion
Safe Workflow (Step-by-Step)
- Create expand migration (additive only).
- Deploy migration job and verify schema compatibility.
- Deploy app with new code path behind feature flag (flag off).
- Enable flag gradually and monitor SLO/error budget.
- Keep rollback window open until confidence threshold.
- Run contract migration only after explicit approval.
Lab Files
lab.mdrunbook-rollback-migrations.mdquiz.md
Done When
- learner can run migration drill with rollback-safe sequence
- learner can distinguish app rollback vs data rollback limits
- learner can define no-go conditions before destructive migration