Skip to content

Epic: Cloud Deployment & Scale-to-Zero #65

@jeremyplichta

Description

@jeremyplichta

Vision

Make tinytown deployable to the cloud with a minimal always-on footprint (townhall + Redis) and agent workers that scale to zero when idle. The target is: start a mission from your laptop, close it, agents run in the cloud, get a ping on your phone when it's done or needs you.

Architecture

┌──────────────────────────────────────────────────────┐
│  You (laptop, phone, or another agent)               │
└──────────────┬───────────────────────────────────────┘
               │ HTTPS
┌──────────────▼───────────────────────────────────────┐
│  Townhall (Always-On, ~256MB)                        │
│  ├─ REST API + MCP + A2A endpoints                   │
│  ├─ Mission dispatcher (tick loop)                   │
│  ├─ /health, /ready, /metrics                        │
│  └─ Scaling signal API                               │
└──────────────┬───────────────────────────────────────┘
               │
┌──────────────▼───────────────────────────────────────┐
│  Redis Cloud (Managed, Always-On, $7-30/mo)          │
│  ├─ Docket Streams (task dispatch)                   │
│  ├─ Event Streams (real-time feed)                   │
│  ├─ Hashes (agent/task/mission state)                │
│  └─ Pub/Sub (broadcast)                              │
└──────────────┬───────────────────────────────────────┘
               │
┌──────────────▼───────────────────────────────────────┐
│  Agent Workers (Scale-to-Zero)                       │
│  ├─ Cloud Run / K8s / Docker on VM                   │
│  ├─ Start when Docket stream has work                │
│  ├─ Each runs: tt agent + coding CLI                 │
│  └─ Idle timeout → exit → container removed          │
└──────────────────────────────────────────────────────┘

Cost Model

  • Idle: ~$10-40/month (small VM for townhall + Redis Cloud)
  • Active mission: + compute per agent-hour (Cloud Run: ~$0.05/hr per agent)
  • Key insight: You only pay for agents when they're working

Prerequisites

Phase 1 (Redis Convention Alignment) should be complete first#64

Specifically:

Execution Order

These can be partially parallelized. #55 and #56 are independent. #57 and #58 depend on both.

1. Remote Redis / Redis Cloud support — #56

Do this first or in parallel with #55. Adds REDIS_URL env var support, TLS connections, and skips local redis-server startup when a remote URL is configured. This is the minimum viable change to run tinytown against cloud Redis.

  • Estimated effort: Small-Medium (2-4 hours)
  • Risk: Low (additive, local mode unchanged)
  • Validates: REDIS_URL=rediss://...redis-cloud.com:12345 townhall connects and works

2. Containerize tinytown — #55

Can be done in parallel with #56. Multi-stage Dockerfiles for townhall and agent-worker. Docker-compose for local dev. CI pipeline to build and push to GHCR.

The agent-worker container is the interesting one — it needs both the tt binary and a coding CLI (claude, augment, etc). Design decision: bake in the CLI or mount it?

  • Estimated effort: Medium (4-8 hours)
  • Risk: Medium (CLI packaging is the tricky part)
  • Validates: docker-compose up starts full stack; agents complete tasks

3. Scale-to-zero agent workers — #57

Depends on #55, #56, #52, #54. Adds the /api/scaling endpoint to townhall that returns queue depth and scaling recommendation. Agent workers get an idle timeout that triggers graceful shutdown (Draining → Stopped → exit).

The actual autoscaler is external (KEDA, Cloud Run, or a simple polling script). Tinytown just provides the signal.

  • Estimated effort: Medium (4-6 hours)
  • Risk: Medium (autoscaler integration varies by platform)
  • Validates: Agent exits after idle timeout; /api/scaling returns correct recommendation

4. Health, readiness, and metrics — #58

Can be done anytime after #56. Standard cloud-native endpoints: /health (liveness), /ready (Redis connected?), /metrics (Prometheus). Required for Kubernetes probes and load balancer health checks.

  • Estimated effort: Small (2-3 hours)
  • Risk: None
  • Validates: K8s probes work; Prometheus scrapes metrics

Definition of Done

What This Unlocks

With this phase complete, tinytown runs in the cloud:

  • Start a mission from anywhere (laptop, phone, another agent)
  • Agents spin up automatically when work arrives
  • Agents shut down when idle (you stop paying)
  • Townhall stays up with minimal footprint
  • Ready for Phase 3 (A2A + MCP + mobile app)

Metadata

Metadata

Assignees

No one assigned

    Labels

    cloudCloud deployment and scalingepicEpic tracking issue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions