feat: migrate agent runtime from Docker to Podman#636
feat: migrate agent runtime from Docker to Podman#636shivammittal274 wants to merge 9 commits intofeat/openclaw-agents-pocfrom
Conversation
… not specified" This reverts commit d99d0e0.
…aunch, stop on last agent delete
Greptile SummaryThis PR replaces all Docker CLI calls with a new
Confidence Score: 3/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant UI as AgentsPage (UI)
participant SR as Server (server.ts)
participant AR as agents.ts
participant PR as PodmanRuntime
participant PM as Podman Machine
SR->>AR: initAgentRuntime() [on startup, if active agents]
AR->>PR: ensureReady()
PR->>PM: machine list (getMachineStatus)
PM-->>PR: status
PR->>PM: machine init (if not initialized)
PR->>PM: machine start (if not running)
PR-->>AR: ready
UI->>SR: POST /agents (create agent)
SR->>AR: createAgent route
AR->>PR: ensureReady()
Note over PR: ⚠️ Race: concurrent calls<br/>both enter if machineReady=false
PR->>PM: machine start / init
PM-->>PR: ok
AR->>PR: runCommand(compose pull)
PR->>PM: podman compose pull
AR->>PR: runCommand(compose up -d)
PR->>PM: podman compose up
UI->>SR: DELETE /agents/:id (last agent)
SR->>AR: deleteAgent route
AR->>PR: ensureReady()
AR->>PR: runCommand(compose down -v)
AR->>AR: stopMachineIfOnlyOurs() [fire & forget]
AR->>PR: runCommand(podman ps)
AR->>PR: stopMachine()
PR->>PM: machine stop
PR-->>AR: machineReady = false
UI->>SR: POST /shutdown
SR->>AR: shutdownAgentRuntime() [fire & forget ⚠️]
AR->>PR: runCommand(compose stop) for each agent
SR->>SR: onShutdown() [called immediately, may exit process]
Prompt To Fix All With AIThis is a comment left during a code review.
Path: packages/browseros-agent/apps/server/src/api/services/podman-runtime.ts
Line: 152-167
Comment:
**`ensureReady` has no concurrency guard**
`initAgentRuntime()` fires `ensureReady()` in the background at server start, and any agent create/start/delete also calls `ensureReady()`. Both can enter past the `machineReady === false` check simultaneously. When neither has set `machineReady = true` yet, both call `initMachine()` concurrently — the second invocation will fail because `podman machine init` returns non-zero when the machine already exists, throwing `Error: podman machine init failed with code 125` and surfacing as an agent creation error to the user.
Fix by collapsing concurrent calls into a single shared promise:
```typescript
private ensureReadyPromise: Promise<void> | null = null
async ensureReady(onLog?: (msg: string) => void): Promise<void> {
if (this.machineReady) return
if (!this.ensureReadyPromise) {
this.ensureReadyPromise = this._doEnsureReady(onLog).finally(() => {
this.ensureReadyPromise = null
})
}
return this.ensureReadyPromise
}
private async _doEnsureReady(onLog?: (msg: string) => void): Promise<void> {
// ... existing body
}
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: packages/browseros-agent/apps/server/src/api/server.ts
Line: 124-129
Comment:
**`shutdownAgentRuntime` not awaited — containers likely not stopped on exit**
`shutdownAgentRuntime()` is fired-and-forgotten before `onShutdown?.()` is called. The shutdown route uses `setImmediate(config.onShutdown)`, so `onShutdown` runs on the very next event-loop tick. If the outer shutdown handler calls `process.exit()` (common in desktop/Electron contexts), all pending microtasks — including the `compose stop` subprocesses inside `shutdownAgentRuntime` — are abandoned immediately, leaving agent containers running after BrowserOS quits.
The fix is to make `onShutdown` async and `await` it, or restructure the shutdown route to wait for cleanup:
```typescript
// server.ts – pass an async callback
shutdownAgentRuntime: async (onShutdown?: () => void) => {
...
await shutdownAgentRuntime()
onShutdown?.()
}
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: packages/browseros-agent/apps/server/src/api/services/podman-runtime.ts
Line: 56-57
Comment:
**Fragile string fallback for machine running state**
`machine.LastUp === 'Currently running'` depends on a specific human-readable string in Podman's JSON output that could change across versions or locales, making `getMachineStatus()` silently return `running: false` even when the machine is up, causing `ensureReady()` to attempt a redundant `startMachine()`.
The `Running` boolean field in `podman machine list --format json` is the canonical indicator; consider dropping the `LastUp` fallback and documenting the minimum supported Podman version instead.
```suggestion
const running = machine.Running === true
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (2): Last reviewed commit: "fix: reset machineReady on stop and prev..." | Re-trigger Greptile |
packages/browseros-agent/apps/agent/entrypoints/app/agents/AgentsPage.tsx
Show resolved
Hide resolved
- Reset machineReady=false in stopMachine() so ensureReady() properly restarts the machine after it was stopped - Use stdout/stderr 'ignore' when no onOutput handler is provided to prevent pipe buffer deadlock during shutdown
|
Both P1 issues fixed in c647310:
|
| async ensureReady(onLog?: (msg: string) => void): Promise<void> { | ||
| if (this.machineReady) return | ||
|
|
||
| const status = await this.getMachineStatus() | ||
|
|
||
| if (!status.initialized) { | ||
| onLog?.('Initializing Podman machine...') | ||
| await this.initMachine(onLog) | ||
| } | ||
|
|
||
| if (!status.running) { | ||
| onLog?.('Starting Podman machine...') | ||
| await this.startMachine(onLog) | ||
| } | ||
|
|
||
| this.machineReady = true |
There was a problem hiding this comment.
ensureReady has no concurrency guard
initAgentRuntime() fires ensureReady() in the background at server start, and any agent create/start/delete also calls ensureReady(). Both can enter past the machineReady === false check simultaneously. When neither has set machineReady = true yet, both call initMachine() concurrently — the second invocation will fail because podman machine init returns non-zero when the machine already exists, throwing Error: podman machine init failed with code 125 and surfacing as an agent creation error to the user.
Fix by collapsing concurrent calls into a single shared promise:
private ensureReadyPromise: Promise<void> | null = null
async ensureReady(onLog?: (msg: string) => void): Promise<void> {
if (this.machineReady) return
if (!this.ensureReadyPromise) {
this.ensureReadyPromise = this._doEnsureReady(onLog).finally(() => {
this.ensureReadyPromise = null
})
}
return this.ensureReadyPromise
}
private async _doEnsureReady(onLog?: (msg: string) => void): Promise<void> {
// ... existing body
}Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/browseros-agent/apps/server/src/api/services/podman-runtime.ts
Line: 152-167
Comment:
**`ensureReady` has no concurrency guard**
`initAgentRuntime()` fires `ensureReady()` in the background at server start, and any agent create/start/delete also calls `ensureReady()`. Both can enter past the `machineReady === false` check simultaneously. When neither has set `machineReady = true` yet, both call `initMachine()` concurrently — the second invocation will fail because `podman machine init` returns non-zero when the machine already exists, throwing `Error: podman machine init failed with code 125` and surfacing as an agent creation error to the user.
Fix by collapsing concurrent calls into a single shared promise:
```typescript
private ensureReadyPromise: Promise<void> | null = null
async ensureReady(onLog?: (msg: string) => void): Promise<void> {
if (this.machineReady) return
if (!this.ensureReadyPromise) {
this.ensureReadyPromise = this._doEnsureReady(onLog).finally(() => {
this.ensureReadyPromise = null
})
}
return this.ensureReadyPromise
}
private async _doEnsureReady(onLog?: (msg: string) => void): Promise<void> {
// ... existing body
}
```
How can I resolve this? If you propose a fix, please make it concise.| shutdownAgentRuntime().catch((err) => | ||
| logger.warn('Failed to shut down agent runtime', { | ||
| error: err instanceof Error ? err.message : String(err), | ||
| }), | ||
| ) | ||
| onShutdown?.() |
There was a problem hiding this comment.
shutdownAgentRuntime not awaited — containers likely not stopped on exit
shutdownAgentRuntime() is fired-and-forgotten before onShutdown?.() is called. The shutdown route uses setImmediate(config.onShutdown), so onShutdown runs on the very next event-loop tick. If the outer shutdown handler calls process.exit() (common in desktop/Electron contexts), all pending microtasks — including the compose stop subprocesses inside shutdownAgentRuntime — are abandoned immediately, leaving agent containers running after BrowserOS quits.
The fix is to make onShutdown async and await it, or restructure the shutdown route to wait for cleanup:
// server.ts – pass an async callback
shutdownAgentRuntime: async (onShutdown?: () => void) => {
...
await shutdownAgentRuntime()
onShutdown?.()
}Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/browseros-agent/apps/server/src/api/server.ts
Line: 124-129
Comment:
**`shutdownAgentRuntime` not awaited — containers likely not stopped on exit**
`shutdownAgentRuntime()` is fired-and-forgotten before `onShutdown?.()` is called. The shutdown route uses `setImmediate(config.onShutdown)`, so `onShutdown` runs on the very next event-loop tick. If the outer shutdown handler calls `process.exit()` (common in desktop/Electron contexts), all pending microtasks — including the `compose stop` subprocesses inside `shutdownAgentRuntime` — are abandoned immediately, leaving agent containers running after BrowserOS quits.
The fix is to make `onShutdown` async and `await` it, or restructure the shutdown route to wait for cleanup:
```typescript
// server.ts – pass an async callback
shutdownAgentRuntime: async (onShutdown?: () => void) => {
...
await shutdownAgentRuntime()
onShutdown?.()
}
```
How can I resolve this? If you propose a fix, please make it concise.
Summary
PodmanRuntimemodule that manages the Podman machine (Linux VM) lifecycle automaticallyChanges
New:
podman-runtime.tsPodmanRuntimeclass wrapping all Podman interactionsinitMachine(),startMachine(),stopMachine(),ensureReady()ensureReady()auto-initializes the machine on first use (~2.2GB one-time download)runCommand()with line-buffered output streamingModified:
agents.tsdockerCLI calls replaced withPodmanRuntime/docker-statusendpoint renamed to/runtime-status(returns machine init/running state)ensureReady()called before every container operation (create, start, stop, delete)podman ps— only stops machine if all running containers are ours (browseros-claw-*prefix)Modified:
AgentsPage.tsx/runtime-statusendpointPrerequisites for reviewers
The
runtime-statusendpoint handles missing Podman gracefully — UI shows "Container runtime not available" if Podman isn't installed.Test plan
curl localhost:9105/agents/runtime-statusreturnsavailable: truewith Podman installed'docker'string references remain in agents.ts (except docker-compose.yml filename)Follow-up work
server-prod-resources.json)🤖 Generated with Claude Code