Skip to content

feat: Add ECS Fargate infrastructure and deployment configuration#362

Open
e9e4e5f0faef wants to merge 3 commits intostagefrom
feat/ecs-fargate-migration
Open

feat: Add ECS Fargate infrastructure and deployment configuration#362
e9e4e5f0faef wants to merge 3 commits intostagefrom
feat/ecs-fargate-migration

Conversation

@e9e4e5f0faef
Copy link
Collaborator

@e9e4e5f0faef e9e4e5f0faef commented Jan 24, 2026

Description

This PR adds infrastructure, CI/CD, and testing to deploy addons-server on AWS ECS Fargate. The full lifecycle has been validated: deploy, smoke test, teardown.

Files changed (15):

Category File Purpose
Docker Dockerfile.ecs ECS-optimised image (non-root, tini, health check)
Docker docker/docker-entrypoint.sh Multi-mode entrypoint (web/worker/versioncheck/manage) with --need-app fast-fail
CI/CD .github/workflows/build-and-push.yml GitHub Actions for ECR build/push with OIDC auth
Pulumi infra/pulumi/__main__.py IaC program (VPC, ECR, Fargate, ElastiCache, Scheduled Tasks, VPC peering, SG hardening)
Pulumi infra/pulumi/config.stage.yaml Stage environment config (142 resources)
Pulumi infra/pulumi/Pulumi.yaml Project definition
Pulumi infra/pulumi/Pulumi.stage.yaml Stack config (aws:region=us-west-2)
Pulumi infra/pulumi/README.md Setup guide
Pulumi infra/pulumi/requirements.txt Python deps (tb_pulumi v0.0.16, Python 3.13+)
Script infra/scripts/guardduty-cleanup.sh Cleanup GuardDuty auto-provisioned VPC endpoint artifacts after pulumi destroy -- safe, tag-gated, dry-run support
Test infra/tests/smoke_test.py RO integration test: connectivity, secrets, DNS, NAT (8 checks)
Test infra/tests/.env.example Example env vars for running smoke test
Test infra/tests/Dockerfile Lightweight image for running smoke test as ECS task
Config settings_local_stage.py Stage settings with Secrets Manager integration
Config .gitignore Exclude Pulumi output files and local analysis docs

Context

This is the initial PR for migrating ATN from EC2/Ansible to ECS Fargate, as discussed with @Sancus. Key decisions and implementation details:

Networking:

  • New VPC 10.100.0.0/16 with public/private subnets across 3 AZs (approved by Andrei)
  • VPC peering to default VPC (vpc-441e5e22) with routes on the correct custom route tables (not the VPC default RT)
  • DNS resolution enabled across peering connection
  • Return route and SG ingress rules on default VPC (Pulumi-managed, cleanly removed on destroy)

Security groups (accounts-repo pattern):

  • Separate ALB and container SGs with dynamic source_security_group_id wiring
  • ALB SGs: 443/80 from internet
  • Container SGs: 8000 from ALB SG only
  • Egress: all protocols (not just TCP -- avoids DNS/UDP issues)
  • Default VPC SG rules on both sg-d5539ea9 (Redis/Memcached/ES/EFS) and sg-5133b52c (RDS/RabbitMQ)

ECS services:

  • Web (4 tasks), Worker (2 tasks), Versioncheck (2 tasks)
  • 16 cron jobs as EventBridge Scheduled Tasks
  • ACM cert wired for HTTPS on both ALBs
  • force_delete on ECR for safe teardown cycles
  • Log configuration delegated to tb_pulumi defaults

IAM and secrets:

  • OIDC role with strict trust policy (aud, iss, sub, job_workflow_ref)
  • Minimal ECR push permissions derived from actual repo ARN
  • Secrets IAM path reconciled: atn/stage/* policy on all task roles (including cron task role)
  • All 8 required secrets pre-exist in Secrets Manager (verified via smoke test)

CI/CD:

  • Build/publish workflow with OIDC gating (inert until AWS_ROLE_ARN repo variable is set)
  • AWS_ROLE_ARN has been set on the repository

Post-deploy validation

pulumi up succeeded in us-west-2 (142 resources, zero errors). Infrastructure was validated via a read-only smoke test run as an ECS Fargate task in the private subnets:

Check Result
RDS MySQL (stage, cross-VPC) Pass (5ms)
ElastiCache Redis (new stack) Pass (3ms)
ElastiCache Redis (existing, cross-VPC) Pass (3ms)
RabbitMQ (EC2, cross-VPC) Pass (1ms)
Elasticsearch (VPC endpoint, HTTPS/443) Pass (2ms)
Secrets Manager (8/8 atn/stage/* accessible) Pass
DNS resolution (cross-VPC peering) Pass
NAT Gateway egress (internet) Pass (326ms)

Stack was cleanly destroyed after validation (142 resources deleted, zero errors). Full lifecycle proven: up, test, destroy, repeatable.

Remaining follow-ups (separate from this PR):

  • Application boot path and secrets injection model (task role vs execution role)
  • OpenSearch, Memcached, EFS components (SOW items, deferred)
  • Autoscaling policies and CloudWatch monitoring
  • Production environment config

Testing

# Pulumi preview (requires Python 3.13+)
cd infra/pulumi
python3.13 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
pulumi stack select thunderbird/thunderbird-addons/stage
pulumi preview  # Shows 142 resources to create

# Docker build test
docker build -f Dockerfile.ecs -t addons-server:test .

# Smoke test (runs as ECS task after pulumi up)
cd infra/tests
docker build -t atn-smoke-test .
# Push to ECR, then: aws ecs run-task (see infra/pulumi/analysis.md for full commands)

Checklist

  • Add a description of the changes introduced in this PR
  • The change has been successfully run locally (Pulumi preview passes)
  • Add tests to cover the changes (infra/tests/smoke_test.py -- 8/8 RO checks pass)
  • Screenshots -- N/A, no UI changes

History squashed to present a clean change-set; no functional changes from individual commits

@e9e4e5f0faef e9e4e5f0faef force-pushed the feat/ecs-fargate-migration branch from 60d4f86 to 2ee8f25 Compare January 24, 2026 01:42
@e9e4e5f0faef e9e4e5f0faef self-assigned this Jan 25, 2026
@e9e4e5f0faef e9e4e5f0faef force-pushed the feat/ecs-fargate-migration branch from 699facf to c54436f Compare January 31, 2026 17:07
@e9e4e5f0faef e9e4e5f0faef mentioned this pull request Feb 3, 2026
3 tasks
@e9e4e5f0faef e9e4e5f0faef force-pushed the feat/ecs-fargate-migration branch from 8dff9f9 to 65b3600 Compare February 14, 2026 17:32
@e9e4e5f0faef e9e4e5f0faef force-pushed the feat/ecs-fargate-migration branch from 65b3600 to 34a5a24 Compare February 14, 2026 18:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants