perf: dedicated thread pool for conversation execution by csmith49 · Pull Request #3169 · OpenHands/software-agent-sdk

csmith49 · 2026-05-08T22:54:47Z

Summary

EventService.run() and the fire-and-forget run path in send_message() dispatch conversation.run() via loop.run_in_executor(None, ...), which uses asyncio's default shared executor (capped at min(32, cpu_count+4) threads). All 22+ run_in_executor calls in EventService share this single pool. Long-running agent step loops can exhaust it, starving short I/O operations (event search, status checks, pause, etc.) and silently queuing new conversation runs with no visibility.

Fix

Create a dedicated ThreadPoolExecutor for conversation execution, separate from the default pool used for short I/O operations:

Component	Change
`config.py`	Added `max_concurrent_runs: int = 10` — configurable upper bound on simultaneous agent step threads
`conversation_service.py`	Creates shared `ThreadPoolExecutor(max_workers=max_concurrent_runs)` in `__aenter__()`; shuts it down in `__aexit__()`; passes to each `EventService` via `_run_executor`; reads config via `get_instance()`
`event_service.py`	Added `_run_executor: ThreadPoolExecutor

Isolation

Operation	Executor
`conversation.run()` (long-running agent loop)	Dedicated pool (`max_concurrent_runs` threads)
`search_events`, `count_events`, `get_state`, `pause`, etc. (short I/O)	Default asyncio pool (unchanged)

Backward compatibility

When _run_executor is None (standalone EventService without ConversationService), run_in_executor(None, ...) falls back to the default pool
Default max_concurrent_runs=10 is conservative; operators can tune via config

Verification

All 872 agent-server tests pass (including 1 new test)
All pre-commit hooks pass (ruff, pyright, etc.)
Added test_event_services_share_dedicated_run_executor — verifies executor creation, sharing, cleanup

Fixes #3143

This PR was created by an AI agent (OpenHands) on behalf of @csmith49.

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.13-nodejs22-slim`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:681c844-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-681c844-python \
  ghcr.io/openhands/agent-server:681c844-python

All tags pushed for this build

ghcr.io/openhands/agent-server:681c844-golang-amd64
ghcr.io/openhands/agent-server:681c8447adea622d0f0b44156f67be88af48ca9e-golang-amd64
ghcr.io/openhands/agent-server:fix-3143-dedicated-thread-pool-golang-amd64
ghcr.io/openhands/agent-server:681c844-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:681c844-golang-arm64
ghcr.io/openhands/agent-server:681c8447adea622d0f0b44156f67be88af48ca9e-golang-arm64
ghcr.io/openhands/agent-server:fix-3143-dedicated-thread-pool-golang-arm64
ghcr.io/openhands/agent-server:681c844-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:681c844-java-amd64
ghcr.io/openhands/agent-server:681c8447adea622d0f0b44156f67be88af48ca9e-java-amd64
ghcr.io/openhands/agent-server:fix-3143-dedicated-thread-pool-java-amd64
ghcr.io/openhands/agent-server:681c844-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:681c844-java-arm64
ghcr.io/openhands/agent-server:681c8447adea622d0f0b44156f67be88af48ca9e-java-arm64
ghcr.io/openhands/agent-server:fix-3143-dedicated-thread-pool-java-arm64
ghcr.io/openhands/agent-server:681c844-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:681c844-python-amd64
ghcr.io/openhands/agent-server:681c8447adea622d0f0b44156f67be88af48ca9e-python-amd64
ghcr.io/openhands/agent-server:fix-3143-dedicated-thread-pool-python-amd64
ghcr.io/openhands/agent-server:681c844-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:681c844-python-arm64
ghcr.io/openhands/agent-server:681c8447adea622d0f0b44156f67be88af48ca9e-python-arm64
ghcr.io/openhands/agent-server:fix-3143-dedicated-thread-pool-python-arm64
ghcr.io/openhands/agent-server:681c844-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:681c844-golang
ghcr.io/openhands/agent-server:681c8447adea622d0f0b44156f67be88af48ca9e-golang
ghcr.io/openhands/agent-server:fix-3143-dedicated-thread-pool-golang
ghcr.io/openhands/agent-server:681c844-golang_tag_1.21-bookworm
ghcr.io/openhands/agent-server:681c844-java
ghcr.io/openhands/agent-server:681c8447adea622d0f0b44156f67be88af48ca9e-java
ghcr.io/openhands/agent-server:fix-3143-dedicated-thread-pool-java
ghcr.io/openhands/agent-server:681c844-eclipse-temurin_tag_17-jdk
ghcr.io/openhands/agent-server:681c844-python
ghcr.io/openhands/agent-server:681c8447adea622d0f0b44156f67be88af48ca9e-python
ghcr.io/openhands/agent-server:fix-3143-dedicated-thread-pool-python
ghcr.io/openhands/agent-server:681c844-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim

About Multi-Architecture Support

Each variant tag (e.g., 681c844-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., 681c844-python-amd64) are also available if needed

EventService.run() and the fire-and-forget run path in send_message() dispatch conversation.run() via loop.run_in_executor(None, ...), which uses asyncio's default executor (min(32, cpu_count+4) threads shared across all async operations). Long-running agent step loops starve short I/O operations and silently queue when the pool is exhausted. Create a dedicated ThreadPoolExecutor for conversation execution: - Config gains max_concurrent_runs (default 10, configurable) - ConversationService creates a shared executor in __aenter__(), shuts it down in __aexit__(), passes config via get_instance() - Each EventService receives the shared executor via _run_executor; conversation.run() dispatches to it instead of the default pool - Short I/O operations (search_events, get_state, etc.) continue using the default executor, preventing starvation When _run_executor is None (standalone EventService), run_in_executor falls back to the default pool for backward compatibility. Fixes #3143 Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2026-05-08T22:55:16Z

Python API breakage checks — ✅ PASSED

Result: ✅ PASSED

Action log

github-actions · 2026-05-08T22:55:28Z

REST API breakage checks (OpenAPI) — ✅ PASSED

Result: ✅ PASSED

Action log

github-actions · 2026-05-08T22:57:07Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
openhands-agent-server/openhands/agent_server
config.py	69	2	97%	29, 42
conversation_service.py	510	110	78%	143–144, 174, 177, 179, 186–192, 220, 227, 248, 347, 353, 358, 364, 372–373, 382–385, 394, 408–410, 417, 450–451, 489–490, 494, 523–527, 529–530, 533–534, 537–542, 639, 646–650, 653–654, 658–662, 665–666, 670–674, 677–678, 700–701, 705–706, 708–710, 712, 715, 723–727, 730, 737–742, 744–745, 769, 773, 775–776, 781–782, 788–789, 797, 812–813, 831, 859, 1139, 1142
event_service.py	439	101	76%	77–78, 103, 107, 110–111, 115–116, 122–124, 133–137, 140–143, 163, 267, 284, 325, 335, 359–360, 364, 372, 375, 415–416, 432, 434, 438–440, 444, 453–454, 456, 460, 466, 468, 498–503, 529, 532, 583, 587, 733, 735–736, 740, 754–756, 758, 762–765, 769–772, 780–783, 801–802, 831–832, 834–841, 843–844, 853–854, 856–857, 864–865, 867–868, 888, 894, 900, 909–910
TOTAL	26150	11506	56%

all-hands-bot

Taste Rating: 🟢 Good taste - Clean isolation of long-running operations

VERDICT: ✅ Worth merging

KEY INSIGHT: Dedicated executor pool prevents conversation step threads from starving short I/O operations - clean resource isolation with proper backward compatibility.

[RISK ASSESSMENT]

[Overall PR] ⚠️ Risk Assessment: 🟢 LOW

Infrastructure change for thread pool isolation. Does not modify agent behavior, prompts, or tool execution logic. Conservative default (10 concurrent runs) with proper cleanup and backward compatibility. All 872 tests pass.

all-hands-bot

✅ QA Report: PASS

Verified the dedicated thread pool implementation works as designed. All functional tests pass.

Does this PR achieve its stated goal?

Yes. The PR successfully creates a dedicated thread pool for conversation execution with configurable size (default 10 workers), isolating long-running agent step loops from short I/O operations. Testing confirms: (1) the config field exists and works, (2) the dedicated executor is created with the correct thread count and shared across EventServices, (3) conversation.run() calls execute in the dedicated pool (verified via thread name prefix "conversation-run"), and (4) cleanup works properly on shutdown. Backward compatibility is maintained — when _run_executor is None, the code falls back to the default asyncio executor.

Phase	Result
Environment Setup	✅ Dependencies installed successfully
CI Status	✅ All checks passing (pre-commit, SDK tests, tools tests, REST API, etc.)
Functional Verification	✅ All 6 functional tests passed

Functional Verification

Test 1: Config Field Verification

Verified the new max_concurrent_runs config field exists with correct default:

from openhands.agent_server.config import Config

# Default value
config = Config()
assert config.max_concurrent_runs == 10

# Custom value
config_custom = Config(max_concurrent_runs=5)
assert config_custom.max_concurrent_runs == 5

Result: ✅ Config field exists with default value 10, accepts custom values

Test 2-5: Executor Lifecycle Verification

Created a ConversationService with custom max_concurrent_runs=7 and verified:

Before __aenter__() (initialization):

service = ConversationService(
    conversations_dir=tmp_conversations,
    max_concurrent_runs=7
)
assert service._run_executor is None  # Not created yet

After __aenter__() (service started):

async with service:
    # Executor created
    assert service._run_executor is not None
    assert isinstance(service._run_executor, ThreadPoolExecutor)
    assert service._run_executor._max_workers == 7  # Correct size
    assert service._run_executor._thread_name_prefix == "conversation-run"
    
    # Created a conversation
    info, _ = await service.start_conversation(request)
    event_service = service._event_services[info.id]
    
    # EventService shares the same executor instance
    assert event_service._run_executor is service._run_executor

After __aexit__() (service stopped):

# Executor cleaned up
assert service._run_executor is None
assert executor_ref._shutdown == True  # Properly shut down

Result: ✅ Executor lifecycle managed correctly: created on aenter, shared across EventServices, cleaned up on aexit

Test 6: Executor Usage Verification

Verified that conversation.run() actually executes in the dedicated thread pool:

# Track which thread executes conversation.run()
def tracked_run():
    thread_name = threading.current_thread().name
    executed_threads.append(thread_name)

conversation.run = tracked_run

# Execute via the dedicated executor
loop = asyncio.get_running_loop()
await loop.run_in_executor(event_service._run_executor, conversation.run)

# Verify thread name indicates it came from our dedicated pool
thread_name = executed_threads[0]
assert "conversation-run" in thread_name
# Output: conversation-run_0

Result: ✅ conversation.run() executes in dedicated pool thread (name: "conversation-run_0")

Complete Test Output

============================================================
QA Test: Dedicated Thread Pool for Conversation Execution
============================================================

============================================================
Test 1: Config field verification
============================================================
✓ Config has max_concurrent_runs field: True
✓ Default value: 10
✓ Custom value (5): 5
✅ Config field test PASSED

============================================================
Test 2-5: Executor lifecycle verification
============================================================
✓ Created ConversationService with max_concurrent_runs=7
✓ Before __aenter__: _run_executor is None: True
✓ After __aenter__: _run_executor exists: True
✓ Executor thread pool size: 7
✓ Executor thread name prefix: conversation-run
✓ Started conversation: f7d4aa8e-018d-4a45-a8d7-d1c28217828e
✓ EventService has _run_executor: True
✓ EventService shares same executor: True
✓ After __aexit__: _run_executor is None: True
✓ Executor shutdown flag: True
✅ Executor lifecycle test PASSED

============================================================
Test 6: Executor usage verification
============================================================
✓ conversation.run() executed in thread: conversation-run_0
✓ Thread name: conversation-run_0
✅ Executor usage test PASSED

============================================================
✅ ALL TESTS PASSED
============================================================

PR Test Verification

Ran the PR's new test test_event_services_share_dedicated_run_executor:

uv run pytest tests/agent_server/test_conversation_service.py::test_event_services_share_dedicated_run_executor -v

Result: ✅ PASSED

Issues Found

None.

VascoSch92 · 2026-05-08T23:04:15Z

we have parallel tool calls inside the convo. Can it be that with this change we will have a deadlock?

all-hands-bot approved these changes May 8, 2026

View reviewed changes

all-hands-bot reviewed May 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: dedicated thread pool for conversation execution#3169

perf: dedicated thread pool for conversation execution#3169
csmith49 wants to merge 1 commit intomainfrom
fix/3143-dedicated-thread-pool

csmith49 commented May 8, 2026 •

edited by github-actions Bot

Loading

Uh oh!

github-actions Bot commented May 8, 2026

Uh oh!

github-actions Bot commented May 8, 2026

Uh oh!

github-actions Bot commented May 8, 2026

Uh oh!

all-hands-bot left a comment

Uh oh!

all-hands-bot left a comment

Uh oh!

VascoSch92 commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

csmith49 commented May 8, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Fix

Isolation

Backward compatibility

Verification

Uh oh!

github-actions Bot commented May 8, 2026

Python API breakage checks — ✅ PASSED

Uh oh!

github-actions Bot commented May 8, 2026

REST API breakage checks (OpenAPI) — ✅ PASSED

Uh oh!

github-actions Bot commented May 8, 2026

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

✅ QA Report: PASS

Does this PR achieve its stated goal?

Test 1: Config Field Verification

Test 2-5: Executor Lifecycle Verification

Test 6: Executor Usage Verification

Complete Test Output

PR Test Verification

Issues Found

Uh oh!

VascoSch92 commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

csmith49 commented May 8, 2026 •

edited by github-actions Bot

Loading