Skip to content

fix: cap per-conversation WebSocket subscriber count#3166

Open
csmith49 wants to merge 2 commits into
mainfrom
fix/3152-bounded-websocket-connections
Open

fix: cap per-conversation WebSocket subscriber count#3166
csmith49 wants to merge 2 commits into
mainfrom
fix/3152-bounded-websocket-connections

Conversation

@csmith49
Copy link
Copy Markdown
Collaborator

@csmith49 csmith49 commented May 8, 2026

Summary

PubSub accepted unlimited subscribers. A malicious or buggy client could open hundreds of WebSocket connections to a single conversation, exhausting server memory and multiplying fan-out cost for every published event.

Fix

Add an optional max_subscribers parameter to PubSub. When the limit is reached, subscribe() raises MaxSubscribersError.

Component Change
pub_sub.py Added max_subscribers: int | None field and MaxSubscribersError exception
event_service.py Default max_subscribers=50 for conversation event PubSub
bash_service.py Default max_subscribers=50 for bash event PubSub
sockets.py Both WS handlers catch MaxSubscribersError → close with WS 1013 (Try Again Later)

The limit of 50 allows ~47 WebSocket connections per conversation (after 2–3 internal subscribers: EventSubscriber, AutoTitleSubscriber, WebhookSubscribers). This is very generous for legitimate use while preventing resource exhaustion.

PubSub(max_subscribers=None) preserves the old unbounded behavior for backward compat.

Verification

  • All 875 agent-server tests pass (including 3 new max_subscribers tests)
  • All pre-commit hooks pass (ruff, pyright, etc.)
  • 5 files changed, +88/-9

Fixes #3152

This PR was created by an AI agent (OpenHands) on behalf of @csmith49.


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22-slim Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:63eee5b-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-63eee5b-python \
  ghcr.io/openhands/agent-server:63eee5b-python

All tags pushed for this build

ghcr.io/openhands/agent-server:63eee5b-golang-amd64
ghcr.io/openhands/agent-server:63eee5b83e9e36ec20b04aee18cb17964c2d4f41-golang-amd64
ghcr.io/openhands/agent-server:fix-3152-bounded-websocket-connections-golang-amd64
ghcr.io/openhands/agent-server:63eee5b-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:63eee5b-golang-arm64
ghcr.io/openhands/agent-server:63eee5b83e9e36ec20b04aee18cb17964c2d4f41-golang-arm64
ghcr.io/openhands/agent-server:fix-3152-bounded-websocket-connections-golang-arm64
ghcr.io/openhands/agent-server:63eee5b-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:63eee5b-java-amd64
ghcr.io/openhands/agent-server:63eee5b83e9e36ec20b04aee18cb17964c2d4f41-java-amd64
ghcr.io/openhands/agent-server:fix-3152-bounded-websocket-connections-java-amd64
ghcr.io/openhands/agent-server:63eee5b-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:63eee5b-java-arm64
ghcr.io/openhands/agent-server:63eee5b83e9e36ec20b04aee18cb17964c2d4f41-java-arm64
ghcr.io/openhands/agent-server:fix-3152-bounded-websocket-connections-java-arm64
ghcr.io/openhands/agent-server:63eee5b-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:63eee5b-python-amd64
ghcr.io/openhands/agent-server:63eee5b83e9e36ec20b04aee18cb17964c2d4f41-python-amd64
ghcr.io/openhands/agent-server:fix-3152-bounded-websocket-connections-python-amd64
ghcr.io/openhands/agent-server:63eee5b-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:63eee5b-python-arm64
ghcr.io/openhands/agent-server:63eee5b83e9e36ec20b04aee18cb17964c2d4f41-python-arm64
ghcr.io/openhands/agent-server:fix-3152-bounded-websocket-connections-python-arm64
ghcr.io/openhands/agent-server:63eee5b-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:63eee5b-golang
ghcr.io/openhands/agent-server:63eee5b83e9e36ec20b04aee18cb17964c2d4f41-golang
ghcr.io/openhands/agent-server:fix-3152-bounded-websocket-connections-golang
ghcr.io/openhands/agent-server:63eee5b-golang_tag_1.21-bookworm
ghcr.io/openhands/agent-server:63eee5b-java
ghcr.io/openhands/agent-server:63eee5b83e9e36ec20b04aee18cb17964c2d4f41-java
ghcr.io/openhands/agent-server:fix-3152-bounded-websocket-connections-java
ghcr.io/openhands/agent-server:63eee5b-eclipse-temurin_tag_17-jdk
ghcr.io/openhands/agent-server:63eee5b-python
ghcr.io/openhands/agent-server:63eee5b83e9e36ec20b04aee18cb17964c2d4f41-python
ghcr.io/openhands/agent-server:fix-3152-bounded-websocket-connections-python
ghcr.io/openhands/agent-server:63eee5b-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim

About Multi-Architecture Support

  • Each variant tag (e.g., 63eee5b-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 63eee5b-python-amd64) are also available if needed

PubSub accepted unlimited subscribers.  A malicious or buggy client
could open hundreds of WebSocket connections to a single conversation,
exhausting server memory and multiplying fan-out cost for every event.

Add an optional max_subscribers parameter to PubSub.  When the limit
is reached, subscribe() raises MaxSubscribersError.  Both EventService
and BashEventService default to 50 subscribers.  The WebSocket
handlers in sockets.py catch the error and close the connection with
WS status 1013 (Try Again Later).

Fixes #3152

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

Python API breakage checks — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

REST API breakage checks (OpenAPI) — ✅ PASSED

Result:PASSED

Action log

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Good taste — Clean, focused solution to a real security problem.

This PR addresses WebSocket resource exhaustion with a simple subscriber limit. The implementation is straightforward:

  • PubSub gains an optional max_subscribers parameter
  • Limit enforced at subscribe time, raises MaxSubscribersError when exceeded
  • WebSocket handlers catch the error and close gracefully (WS 1013)
  • Backward compatible: max_subscribers=None preserves unbounded behavior
  • Well-tested: 3 new unit tests cover limit enforcement, slot reuse, and unlimited mode

The limit of 50 is generous for legitimate use (~47 WebSocket connections after internal subscribers) while preventing abuse.

[RISK ASSESSMENT]

  • [Overall PR] ⚠️ Risk Assessment: 🟢 LOW

This is a server infrastructure change that prevents DoS attacks. It does not affect agent behavior, conversation logic, or benchmark performance. The implementation is simple, maintains backward compatibility, and includes proper error handling. Thread safety is adequate for the asyncio event loop context.

VERDICT:
Worth merging — Addresses a real security concern with minimal, well-tested code.

KEY INSIGHT:
Simple resource limits are effective DoS prevention — this adds robust protection with just ~80 lines of code and zero breaking changes.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-agent-server/openhands/agent_server
   bash_service.py1973084%72–74, 143–144, 146–147, 177–179, 192–195, 275, 280–281, 302–307, 334–335, 337, 344–345, 375–376
   event_service.py4389578%80–81, 106, 110, 113–114, 118–119, 126, 136–140, 143–146, 166, 270, 287, 328, 338, 362–363, 367, 375, 378, 443, 445, 449–451, 455, 464–465, 467, 471, 477, 479, 509–514, 540, 543, 594, 598, 744, 746–747, 751, 765–767, 769, 773–776, 780–783, 791–794, 842–843, 845–852, 854–855, 864–865, 867–868, 875–876, 878–879, 899, 905, 911, 920–921
   pub_sub.py40295%87–88
   sockets.py2174778%164, 167, 170, 247–249, 255–257, 260, 374–377, 382–383, 388–390, 396–398, 402, 404–405, 409–410, 413, 416, 418–419, 422–423, 451, 453, 472, 487–496, 506
TOTAL266901164356% 

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

Successfully verified that this PR implements per-conversation WebSocket subscriber limits as claimed. The PubSub class correctly enforces the limit, raises appropriate exceptions, and WebSocket handlers respond with the correct error codes.

Does this PR achieve its stated goal?

Yes. The PR successfully addresses issue #3152 by implementing bounded WebSocket connections per conversation. The core mechanism works correctly:

  1. PubSub enforces limits: Tested with limits of 2, 5, 10, and 50 — all correctly reject subscribers beyond the limit with MaxSubscribersError
  2. Slots are reusable: Unsubscribing frees capacity for new subscribers
  3. Backward compatible: max_subscribers=None preserves unbounded behavior for existing use cases
  4. Defaults applied: Both EventService and BashEventService default to 50 subscribers
  5. WebSocket handling: Both event and bash WebSocket handlers catch MaxSubscribersError and close with status 1013 (Try Again Later)

The limit of 50 allows ~47 WebSocket connections per conversation (after internal subscribers), which prevents the resource exhaustion attack described in #3152 while remaining generous for legitimate use.

Phase Result
Environment Setup ✅ Python 3.13.13, dependencies installed
CI Status ✅ All checks passing (agent-server-tests, sdk-tests, pre-commit, etc.)
Functional Verification ✅ Core mechanism works correctly
Functional Verification

Verification Approach

Since WebSocket integration testing requires complex conversation setup (the API returned 422 errors due to missing required fields), I verified the fix at two levels:

  1. Direct PubSub testing — Verified the core limiting mechanism
  2. Code inspection — Confirmed WebSocket handlers and service defaults

Test 1: PubSub max_subscribers Enforcement

Setup: Created a test script that directly instantiates PubSub with various limits.

Test with limit=10:

$ python test_pubsub_direct.py
Test 2: Max subscribers limit enforcement
============================================================
✓ Successfully added 10 subscribers (limit: 10)
✓ Correctly rejected subscriber beyond limit: Subscriber limit reached (10)

This confirms subscribe() raises MaxSubscribersError when the limit is reached.

Test with limit=None:

Test 1: Unlimited subscribers (max_subscribers=None)
============================================================
✓ Successfully added 100 subscribers with no limit

This confirms backward compatibility — None preserves unbounded behavior.

Test 2: Slot Reuse After Unsubscribe

Test scenario: Fill to capacity, unsubscribe one, verify a new subscriber can join.

Test 3: Slot freed after unsubscribe
============================================================
✓ Filled to capacity: 5 subscribers
✓ Correctly rejected subscriber at capacity
✓ Unsubscribed one subscriber (4 remaining)
✓ Successfully subscribed after freeing a slot
✓ Limit still enforced after slot refill

This confirms unsubscribing properly frees capacity.

Test 3: Event Dispatching Still Works

Test scenario: Add 3 subscribers to a limited PubSub, dispatch an event via __call__.

Test 4: Event dispatching with limits
============================================================
✓ Subscribed 3 subscribers
✓ All 3 subscribers received the dispatched message

This confirms the limit doesn't break event dispatching.

Test 4: Default Limits in Services

Verification method: Inspected source code of EventService and BashEventService.

Test 5: Default limit in EventService and BashEventService
============================================================
✓ EventService defines PubSub with max_subscribers=50
✓ BashEventService defines PubSub with max_subscribers=50

Code confirmed:

  • event_service.py:62-64: PubSub[Event](max_subscribers=50)
  • bash_service.py:32-34: PubSub[BashEventBase](max_subscribers=50)

Test 5: WebSocket Error Handling

Verification method: Code inspection of sockets.py.

Event WebSocket handler (lines 250-259):

try:
    subscriber_id = await event_service.subscribe_to_events(
        _WebSocketSubscriber(websocket)
    )
except MaxSubscribersError:
    logger.warning(f"Subscriber limit reached for conversation {conversation_id}")
    await websocket.close(
        code=1013, reason="Too many connections for this conversation"
    )
    return

Bash WebSocket handler (lines 369-376):

try:
    subscriber_id = await bash_event_service.subscribe_to_events(
        _BashWebSocketSubscriber(websocket)
    )
except MaxSubscribersError:
    logger.warning("Subscriber limit reached for bash events")
    await websocket.close(code=1013, reason="Too many bash event connections")
    return

Both handlers correctly:

  • Catch MaxSubscribersError
  • Log a warning
  • Close the WebSocket with status 1013 (Try Again Later)
  • Return without adding the subscriber

Test 6: Existing Test Suite

CI results:

  • agent-server-tests: ✅ pass (50s) — includes 3 new PubSub tests
  • sdk-tests: ✅ pass (10s)
  • tools-tests: ✅ pass (9s)
  • workspace-tests: ✅ pass (9s)
  • pre-commit: ✅ pass (1m16s)
  • coverage-report: ✅ pass (19s)

All 875+ existing tests pass, confirming no regressions.

Summary of Test Results

Test Status Evidence
PubSub limit enforcement Rejects subscriber #11 with limit=10
Backward compatibility (None) Accepts 100 subscribers with no limit
Slot reuse after unsubscribe New subscriber succeeds after freeing slot
Event dispatching with limits All subscribers receive events
EventService default=50 Source code inspection confirmed
BashEventService default=50 Source code inspection confirmed
WebSocket error handling Code inspection confirmed
No regressions All CI tests pass

Issues Found

None.


Assessment: This PR successfully implements the resource exhaustion fix described in #3152. The mechanism is sound, backward compatible, and properly integrated into the WebSocket handlers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: unbounded WebSocket connections per conversation

2 participants