Skip to content

Conversation

@smoreinis
Copy link
Collaborator

Summary

Implements a phased migration strategy for task state storage from MongoDB to PostgreSQL with zero-downtime rollout capability.

Key changes:

  • Add TASK_STATE_STORAGE_PHASE feature flag to control migration phases
  • Create task_states PostgreSQL table with JSONB state column
  • Implement dual repository pattern for safe migration rollout
  • Add Datadog StatsD metrics for monitoring data consistency

Migration Phases

Phase Behavior
mongodb Legacy behavior - MongoDB only
dual_write Write to both, read from MongoDB
dual_read Write to both, read from both + verify consistency
postgres PostgreSQL only (target state)

Files Changed

Core Implementation

  • src/adapters/orm.py - Added TaskStateORM model
  • src/config/environment_variables.py - Added TASK_STATE_STORAGE_PHASE env var
  • src/domain/repositories/task_state_postgres_repository.py - PostgreSQL repository (new)
  • src/domain/repositories/task_state_dual_repository.py - Dual-write wrapper (new)
  • src/domain/use_cases/states_use_case.py - Updated to use dual repository

Database Migration

  • database/migrations/alembic/versions/2026_01_12_0000_add_task_states_table_*.py - Alembic migration

Scripts

  • scripts/backfill_task_states.py - Backfill existing MongoDB data to PostgreSQL
  • scripts/verify_task_states.py - Verify data consistency between databases

Tests

  • tests/unit/repositories/test_task_state_postgres_repository.py - 2 tests
  • tests/unit/repositories/test_task_state_dual_repository.py - 35 tests

Metrics (dual_read phase)

Metric Description
task_state.dual_read.match Data matches between MongoDB and PostgreSQL
task_state.dual_read.mismatch.missing_postgres Missing in PostgreSQL
task_state.dual_read.mismatch.missing_mongodb Missing in MongoDB
task_state.dual_read.mismatch.state_content State content differs
task_state.dual_read.list_count_mismatch List counts differ

Rollout Plan

  1. Deploy with TASK_STATE_STORAGE_PHASE=mongodb (no behavior change)
  2. Run backfill: python scripts/backfill_task_states.py
  3. Enable dual-write: Set TASK_STATE_STORAGE_PHASE=dual_write
  4. Enable dual-read: Set TASK_STATE_STORAGE_PHASE=dual_read, monitor metrics
  5. Switch to PostgreSQL: Set TASK_STATE_STORAGE_PHASE=postgres

Rollback

Set TASK_STATE_STORAGE_PHASE back to the previous phase at any time.

Test plan

  • Unit tests pass (35 dual repo tests + 2 postgres repo tests)
  • Integration tests with real databases
  • Manual verification of each phase transition
  • Monitor metrics during dual_read phase before final switch

Implement phased migration strategy for task states:
- Phase 0: Add feature flag TASK_STATE_STORAGE_PHASE
- Add TaskStateORM model and Alembic migration
- Create TaskStatePostgresRepository for PostgreSQL storage
- Create TaskStateDualRepository for phased rollout

Migration phases supported:
- mongodb: Legacy behavior (MongoDB only)
- dual_write: Write to both, read from MongoDB
- dual_read: Write to both, read from both with verification
- postgres: PostgreSQL only (target state)

Includes:
- Datadog StatsD metrics for dual_read verification
- Backfill script for existing MongoDB data
- Verification script for data consistency checks
- Unit tests for all repository operations and metrics
@smoreinis smoreinis force-pushed the stas/task-state-postgres branch from 7685cef to a9f298c Compare January 13, 2026 00:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants