Track structured timing for phased image assembly cleanup

## Problem

The phased SWE/SWT-bench image assembly logs include human-readable timing in lines like:

```text
[assembly] OK ... total=109.8s build=87.2s push=6.2s
```

but the emitted `manifest.jsonl` records do not populate structured timing fields for assembly. In the full SWE-bench validation run below, fields such as `duration_seconds`, `build_seconds`, and related timing fields were `null`, so analyzing performance required scraping the raw GitHub Actions log.

Validation run: https://github.com/OpenHands/benchmarks/actions/runs/24810843867

That run succeeded (`500/500` base images and `500/500` assembled images), but performance/debug analysis was more manual than it should be:

- Average assembly total: ~131.8s/image
- Average docker build: ~106.9s/image
- Average push: ~5.8s/image
- Implied cleanup/other post-build gap: ~19.1s/image
- `docker system prune` produced 170 warning-only rc=1 failures, likely prune-lock contention

## Proposal

Populate structured timing fields in assembly manifest records, including at least:

- total assembly duration
- docker build duration
- docker push duration
- docker rmi duration
- docker system prune duration
- docker builder prune duration
- cleanup status / return codes for rmi, system prune, and builder prune

This would make future tuning less guessy and avoid scraping CI logs for basic timing and cleanup behavior.

## Acceptance criteria

- `manifest.jsonl` for `assemble_all_agent_images` includes non-null timing fields for each assembled image.
- Cleanup subprocess outcomes are represented in a structured way.
- Existing summary tooling either preserves these fields or can surface aggregate build/push/cleanup timing.
- Tests cover the new timing fields for a successful assembly.

## Context

This came out of the SWE-bench disk/OOM follow-up to the SWT-bench cleanup work in PR #672 and SWE-bench validation PR #690.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track structured timing for phased image assembly cleanup #692

Problem

Proposal

Acceptance criteria

Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Track structured timing for phased image assembly cleanup #692

Description

Problem

Proposal

Acceptance criteria

Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions