Epic: Cloud Deployment & Scale-to-Zero

## Vision

Make tinytown deployable to the cloud with a minimal always-on footprint (townhall + Redis) and agent workers that scale to zero when idle. The target is: **start a mission from your laptop, close it, agents run in the cloud, get a ping on your phone when it's done or needs you.**

## Architecture

```
┌──────────────────────────────────────────────────────┐
│  You (laptop, phone, or another agent)               │
└──────────────┬───────────────────────────────────────┘
               │ HTTPS
┌──────────────▼───────────────────────────────────────┐
│  Townhall (Always-On, ~256MB)                        │
│  ├─ REST API + MCP + A2A endpoints                   │
│  ├─ Mission dispatcher (tick loop)                   │
│  ├─ /health, /ready, /metrics                        │
│  └─ Scaling signal API                               │
└──────────────┬───────────────────────────────────────┘
               │
┌──────────────▼───────────────────────────────────────┐
│  Redis Cloud (Managed, Always-On, $7-30/mo)          │
│  ├─ Docket Streams (task dispatch)                   │
│  ├─ Event Streams (real-time feed)                   │
│  ├─ Hashes (agent/task/mission state)                │
│  └─ Pub/Sub (broadcast)                              │
└──────────────┬───────────────────────────────────────┘
               │
┌──────────────▼───────────────────────────────────────┐
│  Agent Workers (Scale-to-Zero)                       │
│  ├─ Cloud Run / K8s / Docker on VM                   │
│  ├─ Start when Docket stream has work                │
│  ├─ Each runs: tt agent + coding CLI                 │
│  └─ Idle timeout → exit → container removed          │
└──────────────────────────────────────────────────────┘
```

### Cost Model
- **Idle**: ~$10-40/month (small VM for townhall + Redis Cloud)
- **Active mission**: + compute per agent-hour (Cloud Run: ~$0.05/hr per agent)
- **Key insight**: You only pay for agents when they're working

## Prerequisites

**Phase 1 (Redis Convention Alignment) should be complete first** — [#64](https://github.com/redis-field-engineering/tinytown/issues/64)

Specifically:
- #52 (Docket Streams) — needed for queue-depth scaling signal
- #54 (Cold/Draining states) — needed for agent lifecycle management

## Execution Order

These can be partially parallelized. #55 and #56 are independent. #57 and #58 depend on both.

### 1. Remote Redis / Redis Cloud support — [#56](https://github.com/redis-field-engineering/tinytown/issues/56)
**Do this first or in parallel with #55.** Adds `REDIS_URL` env var support, TLS connections, and skips local `redis-server` startup when a remote URL is configured. This is the minimum viable change to run tinytown against cloud Redis.

- Estimated effort: Small-Medium (2-4 hours)
- Risk: Low (additive, local mode unchanged)
- Validates: `REDIS_URL=rediss://...redis-cloud.com:12345 townhall` connects and works

### 2. Containerize tinytown — [#55](https://github.com/redis-field-engineering/tinytown/issues/55)
**Can be done in parallel with #56.** Multi-stage Dockerfiles for townhall and agent-worker. Docker-compose for local dev. CI pipeline to build and push to GHCR.

The agent-worker container is the interesting one — it needs both the `tt` binary and a coding CLI (claude, augment, etc). Design decision: bake in the CLI or mount it?

- Estimated effort: Medium (4-8 hours)
- Risk: Medium (CLI packaging is the tricky part)
- Validates: `docker-compose up` starts full stack; agents complete tasks

### 3. Scale-to-zero agent workers — [#57](https://github.com/redis-field-engineering/tinytown/issues/57)
**Depends on #55, #56, #52, #54.** Adds the `/api/scaling` endpoint to townhall that returns queue depth and scaling recommendation. Agent workers get an idle timeout that triggers graceful shutdown (Draining → Stopped → exit).

The actual autoscaler is external (KEDA, Cloud Run, or a simple polling script). Tinytown just provides the signal.

- Estimated effort: Medium (4-6 hours)
- Risk: Medium (autoscaler integration varies by platform)
- Validates: Agent exits after idle timeout; `/api/scaling` returns correct recommendation

### 4. Health, readiness, and metrics — [#58](https://github.com/redis-field-engineering/tinytown/issues/58)
**Can be done anytime after #56.** Standard cloud-native endpoints: `/health` (liveness), `/ready` (Redis connected?), `/metrics` (Prometheus). Required for Kubernetes probes and load balancer health checks.

- Estimated effort: Small (2-3 hours)
- Risk: None
- Validates: K8s probes work; Prometheus scrapes metrics

## Definition of Done

- [ ] #56 — `REDIS_URL` env var, TLS, remote Redis works
- [ ] #55 — Docker images for townhall and agent-worker
- [ ] #57 — `/api/scaling` endpoint, agent idle timeout
- [ ] #58 — `/health`, `/ready`, `/metrics` endpoints
- [ ] `docker-compose up` starts full local stack
- [ ] Townhall connects to Redis Cloud via `REDIS_URL`
- [ ] Agent workers scale to zero after idle timeout
- [ ] At least one autoscaler integration documented (KEDA or Cloud Run or custom script)

## What This Unlocks

With this phase complete, tinytown runs in the cloud:
- Start a mission from anywhere (laptop, phone, another agent)
- Agents spin up automatically when work arrives
- Agents shut down when idle (you stop paying)
- Townhall stays up with minimal footprint
- Ready for Phase 3 (A2A + MCP + mobile app)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Epic: Cloud Deployment & Scale-to-Zero #65

Vision

Architecture

Cost Model

Prerequisites

Execution Order

1. Remote Redis / Redis Cloud support — #56

2. Containerize tinytown — #55

3. Scale-to-zero agent workers — #57

4. Health, readiness, and metrics — #58

Definition of Done

What This Unlocks

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Epic: Cloud Deployment & Scale-to-Zero #65

Description

Vision

Architecture

Cost Model

Prerequisites

Execution Order

1. Remote Redis / Redis Cloud support — #56

2. Containerize tinytown — #55

3. Scale-to-zero agent workers — #57

4. Health, readiness, and metrics — #58

Definition of Done

What This Unlocks

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions