Skip to content

perf: dedicated thread pool for conversation execution#3169

Open
csmith49 wants to merge 1 commit intomainfrom
fix/3143-dedicated-thread-pool
Open

perf: dedicated thread pool for conversation execution#3169
csmith49 wants to merge 1 commit intomainfrom
fix/3143-dedicated-thread-pool

Conversation

@csmith49
Copy link
Copy Markdown
Collaborator

@csmith49 csmith49 commented May 8, 2026

Summary

EventService.run() and the fire-and-forget run path in send_message() dispatch conversation.run() via loop.run_in_executor(None, ...), which uses asyncio's default shared executor (capped at min(32, cpu_count+4) threads). All 22+ run_in_executor calls in EventService share this single pool. Long-running agent step loops can exhaust it, starving short I/O operations (event search, status checks, pause, etc.) and silently queuing new conversation runs with no visibility.

Fix

Create a dedicated ThreadPoolExecutor for conversation execution, separate from the default pool used for short I/O operations:

Component Change
config.py Added max_concurrent_runs: int = 10 — configurable upper bound on simultaneous agent step threads
conversation_service.py Creates shared ThreadPoolExecutor(max_workers=max_concurrent_runs) in __aenter__(); shuts it down in __aexit__(); passes to each EventService via _run_executor; reads config via get_instance()
event_service.py Added `_run_executor: ThreadPoolExecutor

Isolation

Operation Executor
conversation.run() (long-running agent loop) Dedicated pool (max_concurrent_runs threads)
search_events, count_events, get_state, pause, etc. (short I/O) Default asyncio pool (unchanged)

Backward compatibility

  • When _run_executor is None (standalone EventService without ConversationService), run_in_executor(None, ...) falls back to the default pool
  • Default max_concurrent_runs=10 is conservative; operators can tune via config

Verification

  • All 872 agent-server tests pass (including 1 new test)
  • All pre-commit hooks pass (ruff, pyright, etc.)
  • Added test_event_services_share_dedicated_run_executor — verifies executor creation, sharing, cleanup

Fixes #3143

This PR was created by an AI agent (OpenHands) on behalf of @csmith49.


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22-slim Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:681c844-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-681c844-python \
  ghcr.io/openhands/agent-server:681c844-python

All tags pushed for this build

ghcr.io/openhands/agent-server:681c844-golang-amd64
ghcr.io/openhands/agent-server:681c8447adea622d0f0b44156f67be88af48ca9e-golang-amd64
ghcr.io/openhands/agent-server:fix-3143-dedicated-thread-pool-golang-amd64
ghcr.io/openhands/agent-server:681c844-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:681c844-golang-arm64
ghcr.io/openhands/agent-server:681c8447adea622d0f0b44156f67be88af48ca9e-golang-arm64
ghcr.io/openhands/agent-server:fix-3143-dedicated-thread-pool-golang-arm64
ghcr.io/openhands/agent-server:681c844-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:681c844-java-amd64
ghcr.io/openhands/agent-server:681c8447adea622d0f0b44156f67be88af48ca9e-java-amd64
ghcr.io/openhands/agent-server:fix-3143-dedicated-thread-pool-java-amd64
ghcr.io/openhands/agent-server:681c844-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:681c844-java-arm64
ghcr.io/openhands/agent-server:681c8447adea622d0f0b44156f67be88af48ca9e-java-arm64
ghcr.io/openhands/agent-server:fix-3143-dedicated-thread-pool-java-arm64
ghcr.io/openhands/agent-server:681c844-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:681c844-python-amd64
ghcr.io/openhands/agent-server:681c8447adea622d0f0b44156f67be88af48ca9e-python-amd64
ghcr.io/openhands/agent-server:fix-3143-dedicated-thread-pool-python-amd64
ghcr.io/openhands/agent-server:681c844-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:681c844-python-arm64
ghcr.io/openhands/agent-server:681c8447adea622d0f0b44156f67be88af48ca9e-python-arm64
ghcr.io/openhands/agent-server:fix-3143-dedicated-thread-pool-python-arm64
ghcr.io/openhands/agent-server:681c844-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:681c844-golang
ghcr.io/openhands/agent-server:681c8447adea622d0f0b44156f67be88af48ca9e-golang
ghcr.io/openhands/agent-server:fix-3143-dedicated-thread-pool-golang
ghcr.io/openhands/agent-server:681c844-golang_tag_1.21-bookworm
ghcr.io/openhands/agent-server:681c844-java
ghcr.io/openhands/agent-server:681c8447adea622d0f0b44156f67be88af48ca9e-java
ghcr.io/openhands/agent-server:fix-3143-dedicated-thread-pool-java
ghcr.io/openhands/agent-server:681c844-eclipse-temurin_tag_17-jdk
ghcr.io/openhands/agent-server:681c844-python
ghcr.io/openhands/agent-server:681c8447adea622d0f0b44156f67be88af48ca9e-python
ghcr.io/openhands/agent-server:fix-3143-dedicated-thread-pool-python
ghcr.io/openhands/agent-server:681c844-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim

About Multi-Architecture Support

  • Each variant tag (e.g., 681c844-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 681c844-python-amd64) are also available if needed

EventService.run() and the fire-and-forget run path in send_message()
dispatch conversation.run() via loop.run_in_executor(None, ...), which
uses asyncio's default executor (min(32, cpu_count+4) threads shared
across all async operations). Long-running agent step loops starve
short I/O operations and silently queue when the pool is exhausted.

Create a dedicated ThreadPoolExecutor for conversation execution:
  - Config gains max_concurrent_runs (default 10, configurable)
  - ConversationService creates a shared executor in __aenter__(),
    shuts it down in __aexit__(), passes config via get_instance()
  - Each EventService receives the shared executor via _run_executor;
    conversation.run() dispatches to it instead of the default pool
  - Short I/O operations (search_events, get_state, etc.) continue
    using the default executor, preventing starvation

When _run_executor is None (standalone EventService), run_in_executor
falls back to the default pool for backward compatibility.

Fixes #3143

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

Python API breakage checks — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

REST API breakage checks (OpenAPI) — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-agent-server/openhands/agent_server
   config.py69297%29, 42
   conversation_service.py51011078%143–144, 174, 177, 179, 186–192, 220, 227, 248, 347, 353, 358, 364, 372–373, 382–385, 394, 408–410, 417, 450–451, 489–490, 494, 523–527, 529–530, 533–534, 537–542, 639, 646–650, 653–654, 658–662, 665–666, 670–674, 677–678, 700–701, 705–706, 708–710, 712, 715, 723–727, 730, 737–742, 744–745, 769, 773, 775–776, 781–782, 788–789, 797, 812–813, 831, 859, 1139, 1142
   event_service.py43910176%77–78, 103, 107, 110–111, 115–116, 122–124, 133–137, 140–143, 163, 267, 284, 325, 335, 359–360, 364, 372, 375, 415–416, 432, 434, 438–440, 444, 453–454, 456, 460, 466, 468, 498–503, 529, 532, 583, 587, 733, 735–736, 740, 754–756, 758, 762–765, 769–772, 780–783, 801–802, 831–832, 834–841, 843–844, 853–854, 856–857, 864–865, 867–868, 888, 894, 900, 909–910
TOTAL261501150656% 

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating: 🟢 Good taste - Clean isolation of long-running operations

VERDICT: ✅ Worth merging

KEY INSIGHT: Dedicated executor pool prevents conversation step threads from starving short I/O operations - clean resource isolation with proper backward compatibility.


[RISK ASSESSMENT]

  • [Overall PR] ⚠️ Risk Assessment: 🟢 LOW

Infrastructure change for thread pool isolation. Does not modify agent behavior, prompts, or tool execution logic. Conservative default (10 concurrent runs) with proper cleanup and backward compatibility. All 872 tests pass.

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

Verified the dedicated thread pool implementation works as designed. All functional tests pass.

Does this PR achieve its stated goal?

Yes. The PR successfully creates a dedicated thread pool for conversation execution with configurable size (default 10 workers), isolating long-running agent step loops from short I/O operations. Testing confirms: (1) the config field exists and works, (2) the dedicated executor is created with the correct thread count and shared across EventServices, (3) conversation.run() calls execute in the dedicated pool (verified via thread name prefix "conversation-run"), and (4) cleanup works properly on shutdown. Backward compatibility is maintained — when _run_executor is None, the code falls back to the default asyncio executor.

Phase Result
Environment Setup ✅ Dependencies installed successfully
CI Status ✅ All checks passing (pre-commit, SDK tests, tools tests, REST API, etc.)
Functional Verification ✅ All 6 functional tests passed
Functional Verification

Test 1: Config Field Verification

Verified the new max_concurrent_runs config field exists with correct default:

from openhands.agent_server.config import Config

# Default value
config = Config()
assert config.max_concurrent_runs == 10

# Custom value
config_custom = Config(max_concurrent_runs=5)
assert config_custom.max_concurrent_runs == 5

Result: ✅ Config field exists with default value 10, accepts custom values


Test 2-5: Executor Lifecycle Verification

Created a ConversationService with custom max_concurrent_runs=7 and verified:

Before __aenter__() (initialization):

service = ConversationService(
    conversations_dir=tmp_conversations,
    max_concurrent_runs=7
)
assert service._run_executor is None  # Not created yet

After __aenter__() (service started):

async with service:
    # Executor created
    assert service._run_executor is not None
    assert isinstance(service._run_executor, ThreadPoolExecutor)
    assert service._run_executor._max_workers == 7  # Correct size
    assert service._run_executor._thread_name_prefix == "conversation-run"
    
    # Created a conversation
    info, _ = await service.start_conversation(request)
    event_service = service._event_services[info.id]
    
    # EventService shares the same executor instance
    assert event_service._run_executor is service._run_executor

After __aexit__() (service stopped):

# Executor cleaned up
assert service._run_executor is None
assert executor_ref._shutdown == True  # Properly shut down

Result: ✅ Executor lifecycle managed correctly: created on aenter, shared across EventServices, cleaned up on aexit


Test 6: Executor Usage Verification

Verified that conversation.run() actually executes in the dedicated thread pool:

# Track which thread executes conversation.run()
def tracked_run():
    thread_name = threading.current_thread().name
    executed_threads.append(thread_name)

conversation.run = tracked_run

# Execute via the dedicated executor
loop = asyncio.get_running_loop()
await loop.run_in_executor(event_service._run_executor, conversation.run)

# Verify thread name indicates it came from our dedicated pool
thread_name = executed_threads[0]
assert "conversation-run" in thread_name
# Output: conversation-run_0

Result: ✅ conversation.run() executes in dedicated pool thread (name: "conversation-run_0")


Complete Test Output

============================================================
QA Test: Dedicated Thread Pool for Conversation Execution
============================================================

============================================================
Test 1: Config field verification
============================================================
✓ Config has max_concurrent_runs field: True
✓ Default value: 10
✓ Custom value (5): 5
✅ Config field test PASSED

============================================================
Test 2-5: Executor lifecycle verification
============================================================
✓ Created ConversationService with max_concurrent_runs=7
✓ Before __aenter__: _run_executor is None: True
✓ After __aenter__: _run_executor exists: True
✓ Executor thread pool size: 7
✓ Executor thread name prefix: conversation-run
✓ Started conversation: f7d4aa8e-018d-4a45-a8d7-d1c28217828e
✓ EventService has _run_executor: True
✓ EventService shares same executor: True
✓ After __aexit__: _run_executor is None: True
✓ Executor shutdown flag: True
✅ Executor lifecycle test PASSED

============================================================
Test 6: Executor usage verification
============================================================
✓ conversation.run() executed in thread: conversation-run_0
✓ Thread name: conversation-run_0
✅ Executor usage test PASSED

============================================================
✅ ALL TESTS PASSED
============================================================

PR Test Verification

Ran the PR's new test test_event_services_share_dedicated_run_executor:

uv run pytest tests/agent_server/test_conversation_service.py::test_event_services_share_dedicated_run_executor -v

Result: ✅ PASSED

Issues Found

None.

@VascoSch92
Copy link
Copy Markdown
Contributor

we have parallel tool calls inside the convo. Can it be that with this change we will have a deadlock?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: default thread pool with no backpressure for conversation execution

3 participants