Runbook: Rollback and Migration Operations (Advanced)

Purpose

Operate application + schema releases with explicit rollback safety and minimal blast radius.

Scope

This runbook covers:

migration classification and sequencing
rollback execution order
incident handling for migration-related failures
destructive migration approval gates

Migration Types

Expand (safe/additive):

add nullable columns
add new tables/indexes
keep old schema path valid

Contract (destructive):

drop/rename columns
remove legacy constraints/paths
only after stable compatibility window

Pre-Deploy Checklist

Migration classified (expand or contract).
Rollback window defined with owner and duration.
Backup/restore evidence is fresh.
Feature flag plan exists for new code path.
Monitoring and alert thresholds are confirmed.

Rollout Sequence (Mandatory)

Expand migration.
Application deploy with flag OFF.
Controlled flag enable.
Observe stability window.
Contract migration (approval required).

Rollback Order

If incident occurs after expand + app deploy:

disable feature flag (fastest mitigation)
rollback application version if needed
keep expanded schema intact
investigate before any schema reversal

If destructive migration already applied:

treat as high-severity incident
invoke restore/data recovery protocol
communicate RTO/RPO impact immediately

Commands / Evidence

kubectl -n develop get pods
kubectl -n develop get events --sort-by=.lastTimestamp | tail -n 30

Add your migration tool commands and SQL evidence to incident timeline.

Break-Glass Rules

Allowed only with:

incident owner approval
explicit risk acceptance
documented rollback/recovery path
post-incident follow-up task

Failure Modes

Mixed-version incompatibility:

symptom: old pods fail against new schema
action: disable flag + rollback app, preserve expand schema

Long-running lock/contention migration:

symptom: API latency spikes/timeouts
action: stop rollout, reduce scope, schedule maintenance window

Data integrity regression:

symptom: missing/corrupted values after migration
action: incident protocol + restore/repair workflow