Advanced Track Do this after finishing Chapters 01-14.

Estimated Time

  • Reading: 30-40 min
  • Lab: 60-90 min
  • Quiz: 15-20 min

Prerequisites

  • Core track (Chapters 01-14) completed.
  • GitOps promotion and observability workflows available.

Source Code References

  • cnpg-clusters/ Members

Sign in to view source code.

What You Will Produce

A go/no-go evidence package: rollout results, remediation notes, and explicit rollback conditions.

Incident Hook

A database migration runs as part of a routine deployment. Halfway through, the migration fails or corrupts data. The team tries to roll back the application code, but the database schema is already changed and the data is incompatible with the old version.

Result: You’ve reached a “point-of-no-return” where a standard GitOps rollback is insufficient because the stateful layer is broken.

Observed Symptoms

What the team sees first:

  • The application fails to start after a rollback because it cannot read the new schema.
  • Data corruption is detected in specific tables affected by the migration.
  • Responders realize they must now perform a full database restore, significantly increasing the recovery time.

The incident is caused by decoupled code and data rollbacks.

Rollback & Data Migrations Model

To handle stateful changes safely, we follow a strict multi-step model:

  1. Backup First: A manual or automated backup is taken immediately before any migration.
  2. Expand-Contract Pattern: We avoid breaking changes by adding new fields first and only removing old ones after the new version is stable.
  3. Independent Rollback: We prepare both code and data rollback paths before starting the migration.
  4. Point-in-Time Recovery (PITR): We use CNPG to restore the database to the exact second before the migration started if things go wrong.

What AI Would Propose (Brave Junior):

  • “Just run the migration; it passed in local dev.”
  • “Skip the backup to speed up the CI pipeline.”
  • “If it fails, we’ll manually fix the SQL in production.”

Pause and Predict: Before reading the investigation, write down your top 3 hypotheses. What would you check first?