Skip to content

feat: migrate agent runtime from Docker to Podman#636

Open
shivammittal274 wants to merge 9 commits intofeat/openclaw-agents-pocfrom
feat/podman-migration
Open

feat: migrate agent runtime from Docker to Podman#636
shivammittal274 wants to merge 9 commits intofeat/openclaw-agents-pocfrom
feat/podman-migration

Conversation

@shivammittal274
Copy link
Copy Markdown
Contributor

Summary

  • Replace all Docker CLI calls with a new PodmanRuntime module that manages the Podman machine (Linux VM) lifecycle automatically
  • Users no longer need to install Docker — Podman is used instead (currently from system PATH, binary bundling in follow-up)
  • On macOS/Windows, Podman creates a Linux VM via Apple Virtualization.framework / WSL2. On Linux, containers run natively.

Changes

New: podman-runtime.ts

  • PodmanRuntime class wrapping all Podman interactions
  • Machine lifecycle: initMachine(), startMachine(), stopMachine(), ensureReady()
  • ensureReady() auto-initializes the machine on first use (~2.2GB one-time download)
  • runCommand() with line-buffered output streaming

Modified: agents.ts

  • All 6 docker CLI calls replaced with PodmanRuntime
  • /docker-status endpoint renamed to /runtime-status (returns machine init/running state)
  • ensureReady() called before every container operation (create, start, stop, delete)
  • Lifecycle management:
    • Server startup: pre-starts Podman machine if agents exist from previous session
    • Server shutdown: stops all BrowserOS containers + stops machine (only if no other user containers are running)
    • Delete last agent: stops machine (only if no other user containers are running)
  • Safe machine stop: checks podman ps — only stops machine if all running containers are ours (browseros-claw-* prefix)

Modified: AgentsPage.tsx

  • Replaced Docker warning with runtime-agnostic messaging
  • Uses new /runtime-status endpoint
  • Removed Docker Desktop / OrbStack install links

Prerequisites for reviewers

brew install podman

The runtime-status endpoint handles missing Podman gracefully — UI shows "Container runtime not available" if Podman isn't installed.

Test plan

  • curl localhost:9105/agents/runtime-status returns available: true with Podman installed
  • Create an agent — machine auto-inits if needed, container starts, health check passes
  • Stop/start/delete agent — all work through Podman compose
  • Delete last agent — machine stops (if no other containers running)
  • Quit BrowserOS — containers stop, machine stops (if safe)
  • User's own Podman containers are NOT affected by BrowserOS shutdown
  • No 'docker' string references remain in agents.ts (except docker-compose.yml filename)

Follow-up work

  • Bundle Podman binary with BrowserOS (upload to R2, add to server-prod-resources.json)
  • First-run progress UI for the ~2.2GB machine init download
  • Windows testing (WSL2 backend)

🤖 Generated with Claude Code

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 1, 2026

Greptile Summary

This PR replaces all Docker CLI calls with a new PodmanRuntime service that manages the Podman machine lifecycle automatically, removes Docker-specific UI copy, and wires startup/shutdown hooks into the server. Previous thread issues (machineReady flag reset, runCommand pipe deadlock) are correctly addressed, but two new P1 concerns remain.

  • ensureReady() has no concurrency guard: initAgentRuntime() fires it in the background at server start, and any concurrent agent operation also calls it. Both passes see machineReady === false and both enter initMachine(), where the second will throw podman machine init failed with code 125.
  • shutdownAgentRuntime() is fire-and-forgotten in the shutdown handler — if the outer onShutdown leads to process.exit(), the compose stop subprocesses are abandoned and containers remain running after BrowserOS quits.

Confidence Score: 3/5

  • Not safe to merge — two P1 defects can cause agent creation failures under concurrent init and leave containers running on server shutdown
  • Prior thread issues are resolved, but a new concurrency bug in ensureReady() causes agent creation to fail when initAgentRuntime races with a user action, and the unawaited shutdownAgentRuntime means container cleanup on exit is unreliable. Both are on the critical agent lifecycle path.
  • packages/browseros-agent/apps/server/src/api/services/podman-runtime.ts (ensureReady concurrency) and packages/browseros-agent/apps/server/src/api/server.ts (unawaited shutdown)

Important Files Changed

Filename Overview
packages/browseros-agent/apps/server/src/api/services/podman-runtime.ts New PodmanRuntime class wrapping all Podman interactions; has a concurrency bug in ensureReady() and a fragile string-based running-state check
packages/browseros-agent/apps/server/src/api/routes/agents.ts Docker CLI calls replaced with PodmanRuntime; adds lifecycle hooks and stopMachineIfOnlyOurs; shutdownAgentRuntime is not awaited in the shutdown handler
packages/browseros-agent/apps/server/src/api/server.ts Wires initAgentRuntime and shutdownAgentRuntime into server startup/shutdown; shutdownAgentRuntime is fire-and-forgotten, risking incomplete cleanup on process exit
packages/browseros-agent/apps/agent/entrypoints/app/agents/AgentsPage.tsx Replaces Docker-specific UI copy and links with runtime-agnostic messaging; switches docker-status endpoint to runtime-status via raw fetch

Sequence Diagram

sequenceDiagram
    participant UI as AgentsPage (UI)
    participant SR as Server (server.ts)
    participant AR as agents.ts
    participant PR as PodmanRuntime
    participant PM as Podman Machine

    SR->>AR: initAgentRuntime() [on startup, if active agents]
    AR->>PR: ensureReady()
    PR->>PM: machine list (getMachineStatus)
    PM-->>PR: status
    PR->>PM: machine init (if not initialized)
    PR->>PM: machine start (if not running)
    PR-->>AR: ready

    UI->>SR: POST /agents (create agent)
    SR->>AR: createAgent route
    AR->>PR: ensureReady()
    Note over PR: ⚠️ Race: concurrent calls<br/>both enter if machineReady=false
    PR->>PM: machine start / init
    PM-->>PR: ok
    AR->>PR: runCommand(compose pull)
    PR->>PM: podman compose pull
    AR->>PR: runCommand(compose up -d)
    PR->>PM: podman compose up

    UI->>SR: DELETE /agents/:id (last agent)
    SR->>AR: deleteAgent route
    AR->>PR: ensureReady()
    AR->>PR: runCommand(compose down -v)
    AR->>AR: stopMachineIfOnlyOurs() [fire & forget]
    AR->>PR: runCommand(podman ps)
    AR->>PR: stopMachine()
    PR->>PM: machine stop
    PR-->>AR: machineReady = false

    UI->>SR: POST /shutdown
    SR->>AR: shutdownAgentRuntime() [fire & forget ⚠️]
    AR->>PR: runCommand(compose stop) for each agent
    SR->>SR: onShutdown() [called immediately, may exit process]
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: packages/browseros-agent/apps/server/src/api/services/podman-runtime.ts
Line: 152-167

Comment:
**`ensureReady` has no concurrency guard**

`initAgentRuntime()` fires `ensureReady()` in the background at server start, and any agent create/start/delete also calls `ensureReady()`. Both can enter past the `machineReady === false` check simultaneously. When neither has set `machineReady = true` yet, both call `initMachine()` concurrently — the second invocation will fail because `podman machine init` returns non-zero when the machine already exists, throwing `Error: podman machine init failed with code 125` and surfacing as an agent creation error to the user.

Fix by collapsing concurrent calls into a single shared promise:

```typescript
private ensureReadyPromise: Promise<void> | null = null

async ensureReady(onLog?: (msg: string) => void): Promise<void> {
  if (this.machineReady) return
  if (!this.ensureReadyPromise) {
    this.ensureReadyPromise = this._doEnsureReady(onLog).finally(() => {
      this.ensureReadyPromise = null
    })
  }
  return this.ensureReadyPromise
}

private async _doEnsureReady(onLog?: (msg: string) => void): Promise<void> {
  // ... existing body
}
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: packages/browseros-agent/apps/server/src/api/server.ts
Line: 124-129

Comment:
**`shutdownAgentRuntime` not awaited — containers likely not stopped on exit**

`shutdownAgentRuntime()` is fired-and-forgotten before `onShutdown?.()` is called. The shutdown route uses `setImmediate(config.onShutdown)`, so `onShutdown` runs on the very next event-loop tick. If the outer shutdown handler calls `process.exit()` (common in desktop/Electron contexts), all pending microtasks — including the `compose stop` subprocesses inside `shutdownAgentRuntime` — are abandoned immediately, leaving agent containers running after BrowserOS quits.

The fix is to make `onShutdown` async and `await` it, or restructure the shutdown route to wait for cleanup:

```typescript
// server.ts – pass an async callback
shutdownAgentRuntime: async (onShutdown?: () => void) => {
  ...
  await shutdownAgentRuntime()
  onShutdown?.()
}
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: packages/browseros-agent/apps/server/src/api/services/podman-runtime.ts
Line: 56-57

Comment:
**Fragile string fallback for machine running state**

`machine.LastUp === 'Currently running'` depends on a specific human-readable string in Podman's JSON output that could change across versions or locales, making `getMachineStatus()` silently return `running: false` even when the machine is up, causing `ensureReady()` to attempt a redundant `startMachine()`.

The `Running` boolean field in `podman machine list --format json` is the canonical indicator; consider dropping the `LastUp` fallback and documenting the minimum supported Podman version instead.

```suggestion
      const running = machine.Running === true
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (2): Last reviewed commit: "fix: reset machineReady on stop and prev..." | Re-trigger Greptile

- Reset machineReady=false in stopMachine() so ensureReady() properly
  restarts the machine after it was stopped
- Use stdout/stderr 'ignore' when no onOutput handler is provided to
  prevent pipe buffer deadlock during shutdown
@DaniAkash
Copy link
Copy Markdown
Contributor

Both P1 issues fixed in c647310:

  • machineReady reset to false in stopMachine()
  • stdout/stderr set to 'ignore' when no onOutput handler is provided

@DaniAkash
Copy link
Copy Markdown
Contributor

@greptileai

Comment on lines +152 to +167
async ensureReady(onLog?: (msg: string) => void): Promise<void> {
if (this.machineReady) return

const status = await this.getMachineStatus()

if (!status.initialized) {
onLog?.('Initializing Podman machine...')
await this.initMachine(onLog)
}

if (!status.running) {
onLog?.('Starting Podman machine...')
await this.startMachine(onLog)
}

this.machineReady = true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 ensureReady has no concurrency guard

initAgentRuntime() fires ensureReady() in the background at server start, and any agent create/start/delete also calls ensureReady(). Both can enter past the machineReady === false check simultaneously. When neither has set machineReady = true yet, both call initMachine() concurrently — the second invocation will fail because podman machine init returns non-zero when the machine already exists, throwing Error: podman machine init failed with code 125 and surfacing as an agent creation error to the user.

Fix by collapsing concurrent calls into a single shared promise:

private ensureReadyPromise: Promise<void> | null = null

async ensureReady(onLog?: (msg: string) => void): Promise<void> {
  if (this.machineReady) return
  if (!this.ensureReadyPromise) {
    this.ensureReadyPromise = this._doEnsureReady(onLog).finally(() => {
      this.ensureReadyPromise = null
    })
  }
  return this.ensureReadyPromise
}

private async _doEnsureReady(onLog?: (msg: string) => void): Promise<void> {
  // ... existing body
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/browseros-agent/apps/server/src/api/services/podman-runtime.ts
Line: 152-167

Comment:
**`ensureReady` has no concurrency guard**

`initAgentRuntime()` fires `ensureReady()` in the background at server start, and any agent create/start/delete also calls `ensureReady()`. Both can enter past the `machineReady === false` check simultaneously. When neither has set `machineReady = true` yet, both call `initMachine()` concurrently — the second invocation will fail because `podman machine init` returns non-zero when the machine already exists, throwing `Error: podman machine init failed with code 125` and surfacing as an agent creation error to the user.

Fix by collapsing concurrent calls into a single shared promise:

```typescript
private ensureReadyPromise: Promise<void> | null = null

async ensureReady(onLog?: (msg: string) => void): Promise<void> {
  if (this.machineReady) return
  if (!this.ensureReadyPromise) {
    this.ensureReadyPromise = this._doEnsureReady(onLog).finally(() => {
      this.ensureReadyPromise = null
    })
  }
  return this.ensureReadyPromise
}

private async _doEnsureReady(onLog?: (msg: string) => void): Promise<void> {
  // ... existing body
}
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +124 to 129
shutdownAgentRuntime().catch((err) =>
logger.warn('Failed to shut down agent runtime', {
error: err instanceof Error ? err.message : String(err),
}),
)
onShutdown?.()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 shutdownAgentRuntime not awaited — containers likely not stopped on exit

shutdownAgentRuntime() is fired-and-forgotten before onShutdown?.() is called. The shutdown route uses setImmediate(config.onShutdown), so onShutdown runs on the very next event-loop tick. If the outer shutdown handler calls process.exit() (common in desktop/Electron contexts), all pending microtasks — including the compose stop subprocesses inside shutdownAgentRuntime — are abandoned immediately, leaving agent containers running after BrowserOS quits.

The fix is to make onShutdown async and await it, or restructure the shutdown route to wait for cleanup:

// server.ts – pass an async callback
shutdownAgentRuntime: async (onShutdown?: () => void) => {
  ...
  await shutdownAgentRuntime()
  onShutdown?.()
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/browseros-agent/apps/server/src/api/server.ts
Line: 124-129

Comment:
**`shutdownAgentRuntime` not awaited — containers likely not stopped on exit**

`shutdownAgentRuntime()` is fired-and-forgotten before `onShutdown?.()` is called. The shutdown route uses `setImmediate(config.onShutdown)`, so `onShutdown` runs on the very next event-loop tick. If the outer shutdown handler calls `process.exit()` (common in desktop/Electron contexts), all pending microtasks — including the `compose stop` subprocesses inside `shutdownAgentRuntime` — are abandoned immediately, leaving agent containers running after BrowserOS quits.

The fix is to make `onShutdown` async and `await` it, or restructure the shutdown route to wait for cleanup:

```typescript
// server.ts – pass an async callback
shutdownAgentRuntime: async (onShutdown?: () => void) => {
  ...
  await shutdownAgentRuntime()
  onShutdown?.()
}
```

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants