Workflow runs · OpenHands/benchmarks

Actions

All workflows
Workflows
- Build SWE-Bench Images Build SWE-Bench Images
- .github/workflows/experiment-ab-test.yml .github/workflows/experiment-ab-test.yml
- .github/workflows/experiment-arg-ordering.yml .github/workflows/experiment-arg-ordering.yml
- Build Commit0 Images Build Commit0 Images
- Build GAIA Images Build GAIA Images
- Build Multi-SWE-Bench Images Build Multi-SWE-Bench Images
- Build ProgramBench Images Build ProgramBench Images
- Build SWE-Bench Multimodal Images Build SWE-Bench Multimodal Images
- Build SWE-Bench Pro Images Build SWE-Bench Pro Images
- Build SWE-Gym Images Build SWE-Gym Images
Management
- Caches

All workflows

Actions

Loading...
Loading

Showing runs from all workflows

2,500+ workflow runs

Upgrade LiteLLM to 1.84.0rc1 PR Review by OpenHands #507: Pull request #709 opened by neubig

Upgrade LiteLLM to 1.84.0rc1 Run tests #1996: Pull request #709 opened by neubig

2m 17s upgrade-litellm-1.84.0-rc.1

upgrade-litellm-1.84.0-rc.1

2m 17s

Upgrade LiteLLM to 1.84.0rc1 Pre-commit checks #2230: Pull request #709 opened by neubig

55s upgrade-litellm-1.84.0-rc.1

upgrade-litellm-1.84.0-rc.1

55s

GHCR retention - eval-agent-server GHCR retention - eval-agent-server #27: Scheduled

10s main

main

10s

Add SWE-Bench Pro benchmark support (#699) Pre-commit checks #2229: Commit 2430744 pushed by neubig

1m 0s main

main

1m 0s

Add SWE-Bench Pro benchmark support (#699) Run tests #1995: Commit 2430744 pushed by neubig

2m 9s main

main

2m 9s

Add SWE-Bench Pro benchmark support PR Review Evaluation #189: Pull request #699 closed by neubig

32s

eval: honor GPT-5 prompt when available PR Review Evaluation #188: Pull request #686 closed by enyst

1m 34s

Add SWE-Bench Pro benchmark support Run tests #1994: Pull request #699 synchronize by neubig

2m 20s add-swebench-pro

add-swebench-pro

2m 20s

Add SWE-Bench Pro benchmark support Pre-commit checks #2228: Pull request #699 synchronize by neubig

57s add-swebench-pro

add-swebench-pro

57s

Add SWE-Bench Pro benchmark support Pre-commit checks #2227: Pull request #699 synchronize by neubig

50s add-swebench-pro

add-swebench-pro

50s

Add SWE-Bench Pro benchmark support Run tests #1993: Pull request #699 synchronize by neubig

2m 14s add-swebench-pro

add-swebench-pro

2m 14s

GHCR retention - eval-agent-server GHCR retention - eval-agent-server #26: Scheduled

12s main

main

12s

Add ProgramBench benchmark integration PR Review Evaluation #187: Pull request #703 closed by neubig

1m 7s

Add ProgramBench benchmark integration Pre-commit checks #2226: Commit 2e01f40 pushed by neubig

50s main

main

50s

Add ProgramBench benchmark integration Run tests #1992: Commit 2e01f40 pushed by neubig

2m 9s main

main

2m 9s

Add ProgramBench benchmark integration PR Review by OpenHands #506: Pull request #703 review_requested by neubig

3m 2s

Add ProgramBench benchmark integration PR Review by OpenHands #505: Pull request #703 review_requested by neubig

2m 43s

Add ProgramBench benchmark integration Pre-commit checks #2225: Pull request #703 synchronize by neubig

51s feat/programbench

feat/programbench

51s

Add ProgramBench benchmark integration Run tests #1991: Pull request #703 synchronize by neubig

2m 9s feat/programbench

feat/programbench

2m 9s

Add ProgramBench benchmark integration PR Review by OpenHands #504: Pull request #703 ready_for_review by neubig

3m 50s

GHCR retention - eval-agent-server GHCR retention - eval-agent-server #25: Scheduled

11s main

main

11s

GHCR retention - eval-agent-server GHCR retention - eval-agent-server #24: Scheduled

7s main

main

[codex] Add EvoClaw benchmark inference Pre-commit checks #2224: Pull request #705 synchronize by xingyaoww

1m 6s codex/evoclaw-benchmark

codex/evoclaw-benchmark

1m 6s

[codex] Add EvoClaw benchmark inference Run tests #1990: Pull request #705 synchronize by xingyaoww

2m 19s codex/evoclaw-benchmark

codex/evoclaw-benchmark

2m 19s

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Actions

Workflows

Management

All workflows

Actions

Loading...
Loading

All workflows

Uh oh!

Filter by Workflow

Sorry, something went wrong.

Sorry, something went wrong.

No matching workflows.

Filter by Event

Sorry, something went wrong.

Sorry, something went wrong.

No matching events.

Filter by Status

Sorry, something went wrong.

Sorry, something went wrong.

No matching statuses.

Filter by Branch

Sorry, something went wrong.

Sorry, something went wrong.

No matching branches.

Filter by Actor

Sorry, something went wrong.

Sorry, something went wrong.

No matching users.

Actions: OpenHands/benchmarks

Actions

All workflows All workflows Actions Loading... Loading Sorry, something went wrong. Uh oh! There was an error while loading. Please reload this page.

All workflows

All workflows

Actions

Loading...
Loading