perf: change Condensation.forgotten_event_ids from list to set by csmith49 · Pull Request #3156 · OpenHands/software-agent-sdk

csmith49 · 2026-05-08T17:00:29Z

Summary

Change Condensation.forgotten_event_ids from list[EventID] to set[EventID].

The apply() method does event.id not in self.forgotten_event_ids — an O(F) lookup per event with a list but O(1) with a set. This reduces each condensation application from O(V×F) to O(V).

Changes

Core fix:

condenser.py — Field type list[EventID] → set[EventID], default_factory=list → default_factory=set
llm_summarizing_condenser.py — List comprehension → set comprehension at construction site

Call-site updates (10 files):

Updated all construction sites in tests and the integration test condenser to pass sets instead of lists, satisfying pyright's strict type checking
Patterns: forgotten_event_ids=[x, y] → {x, y}, forgotten_event_ids=[] → set(), [x.id for x in ...] → {x.id for x in ...}

Verification

EventID = str — hashable, set-compatible
Pydantic v2 handles set[str] serialization natively (JSON arrays)
The frozen=True Event model remains fully compatible
All 209 affected tests pass
All pre-commit hooks pass

Fixes #3150

This PR was created by an AI agent (OpenHands) on behalf of @csmith49.

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.13-nodejs22-slim`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:4d288d9-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-4d288d9-python \
  ghcr.io/openhands/agent-server:4d288d9-python

All tags pushed for this build

ghcr.io/openhands/agent-server:4d288d9-golang-amd64
ghcr.io/openhands/agent-server:4d288d94646eca403448ce96bbe9aa25e29eb5df-golang-amd64
ghcr.io/openhands/agent-server:fix-3150-condensation-forgotten-event-ids-set-golang-amd64
ghcr.io/openhands/agent-server:4d288d9-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:4d288d9-golang-arm64
ghcr.io/openhands/agent-server:4d288d94646eca403448ce96bbe9aa25e29eb5df-golang-arm64
ghcr.io/openhands/agent-server:fix-3150-condensation-forgotten-event-ids-set-golang-arm64
ghcr.io/openhands/agent-server:4d288d9-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:4d288d9-java-amd64
ghcr.io/openhands/agent-server:4d288d94646eca403448ce96bbe9aa25e29eb5df-java-amd64
ghcr.io/openhands/agent-server:fix-3150-condensation-forgotten-event-ids-set-java-amd64
ghcr.io/openhands/agent-server:4d288d9-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:4d288d9-java-arm64
ghcr.io/openhands/agent-server:4d288d94646eca403448ce96bbe9aa25e29eb5df-java-arm64
ghcr.io/openhands/agent-server:fix-3150-condensation-forgotten-event-ids-set-java-arm64
ghcr.io/openhands/agent-server:4d288d9-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:4d288d9-python-amd64
ghcr.io/openhands/agent-server:4d288d94646eca403448ce96bbe9aa25e29eb5df-python-amd64
ghcr.io/openhands/agent-server:fix-3150-condensation-forgotten-event-ids-set-python-amd64
ghcr.io/openhands/agent-server:4d288d9-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:4d288d9-python-arm64
ghcr.io/openhands/agent-server:4d288d94646eca403448ce96bbe9aa25e29eb5df-python-arm64
ghcr.io/openhands/agent-server:fix-3150-condensation-forgotten-event-ids-set-python-arm64
ghcr.io/openhands/agent-server:4d288d9-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:4d288d9-golang
ghcr.io/openhands/agent-server:4d288d94646eca403448ce96bbe9aa25e29eb5df-golang
ghcr.io/openhands/agent-server:fix-3150-condensation-forgotten-event-ids-set-golang
ghcr.io/openhands/agent-server:4d288d9-golang_tag_1.21-bookworm
ghcr.io/openhands/agent-server:4d288d9-java
ghcr.io/openhands/agent-server:4d288d94646eca403448ce96bbe9aa25e29eb5df-java
ghcr.io/openhands/agent-server:fix-3150-condensation-forgotten-event-ids-set-java
ghcr.io/openhands/agent-server:4d288d9-eclipse-temurin_tag_17-jdk
ghcr.io/openhands/agent-server:4d288d9-python
ghcr.io/openhands/agent-server:4d288d94646eca403448ce96bbe9aa25e29eb5df-python
ghcr.io/openhands/agent-server:fix-3150-condensation-forgotten-event-ids-set-python
ghcr.io/openhands/agent-server:4d288d9-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim

About Multi-Architecture Support

Each variant tag (e.g., 4d288d9-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., 4d288d9-python-amd64) are also available if needed

Change the forgotten_event_ids field on Condensation from list[EventID] to set[EventID]. The apply() method uses `event.id not in self.forgotten_event_ids` which is O(F) per event with a list but O(1) with a set. This reduces each condensation application from O(V×F) to O(V). Fixes #3150 Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2026-05-08T17:01:01Z

Python API breakage checks — ✅ PASSED

Result: ✅ PASSED

Action log

github-actions · 2026-05-08T17:01:11Z

REST API breakage checks (OpenAPI) — ✅ PASSED

Result: ✅ PASSED

Action log

all-hands-bot

Good performance optimization with the right data structure choice. However, this PR modifies an event type (Condensation) which affects persisted conversations. Per repository guidelines, event type modifications require explicit backward compatibility tests even when Pydantic handles conversion transparently. See inline comment for details.

github-actions · 2026-05-08T17:05:00Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
openhands-sdk/openhands/sdk/context/condenser
llm_summarizing_condenser.py	121	16	86%	275–276, 278–280, 285, 288–289, 292–293, 298, 303, 305–306, 319–320
openhands-sdk/openhands/sdk/event
condenser.py	52	1	98%	63
TOTAL	26361	7597	71%

Verify that Condensation events persisted with the old list[EventID] format still deserialize correctly into the new set[EventID] field. Co-authored-by: openhands <openhands@all-hands.dev>

all-hands-bot

✅ QA Report: PASS

Performance optimization verified: forgotten_event_ids changed from list to set delivers 50x speedup with identical functional behavior.

Does this PR achieve its stated goal?

Yes. The PR aims to improve condensation performance from O(V×F) to O(V) by changing Condensation.forgotten_event_ids from list[EventID] to set[EventID]. Real execution confirms the change delivers a 50.40x performance improvement (0.003134s → 0.000062s for 1000 events with 500 forgotten IDs), while functional behavior remains unchanged and Pydantic correctly handles set serialization.

Phase	Result
Environment Setup	✅ `make build` successful, dependencies installed
CI Status	⚠️ Several checks passing (REST API, Python API, check, check-docstrings), test suites pending
Functional Verification	✅ All behaviors verified through real execution

Functional Verification

Test 1: Before/After Performance Comparison

Step 1 — Baseline (main branch with list):

Checked out main branch and ran performance test:

$ OPENHANDS_SUPPRESS_BANNER=1 .venv/bin/python /tmp/test_condensation_list.py

============================================================
BASELINE TEST (main branch with list)
============================================================
forgotten_event_ids type: <class 'list'>
forgotten_event_ids value: ['id1', 'id2', 'id3']

Performance test: 1000 events, 500 forgotten
List-based lookup: 0.003134s (500 hits)
✅ Confirmed: Using list[EventID]

This confirms the baseline uses list[EventID] and takes 0.003134s for 1000 events with 500 forgotten IDs.

Step 2 — Apply PR changes:

Checked out PR branch pr-3156 (commit 6dca2256).

Step 3 — Re-run with PR changes:

Ran comprehensive test suite:

$ OPENHANDS_SUPPRESS_BANNER=1 .venv/bin/python /tmp/test_condensation_set.py

============================================================
TEST 1: Functional Behavior
============================================================
Created 5 events
Event IDs: ['e2a00339-7dd3-4e84-bac3-713a7bf1232f', '845b4a7a-5cc0-4701-98c2-2f1600e88105', ...]
Condensation forgotten_event_ids type: <class 'set'>
Condensation forgotten_event_ids: {'893c33a2-3124-4803-83c6-2231b9847041', '845b4a7a-5cc0-4701-98c2-2f1600e88105'}

View events count: 4
View event IDs: ['ba85bf18-6758-4d4b-ab6f-c17fb4547fea-summary', 'e2a00339-7dd3-4e84-bac3-713a7bf1232f', ...]
✅ PASS: Correct events were forgotten

============================================================
TEST 2: Serialization/Deserialization
============================================================
Original forgotten_event_ids type: <class 'set'>
Serialized to JSON:
  {"forgotten_event_ids":["id2","id1","id3"],...}

Parsed JSON forgotten_event_ids type: <class 'list'>
Deserialized forgotten_event_ids type: <class 'set'>
✅ PASS: Serialization round-trip successful

============================================================
TEST 3: Performance Benchmark
============================================================
Creating 1000 events, forgetting 500 of them

Benchmarking list-based lookup (O(n))...
  List lookup: 0.003131s (500 hits)
Benchmarking set-based lookup (O(1))...
  Set lookup:  0.000062s (500 hits)

  Speedup: 50.40x faster with set
✅ PASS: Set-based lookup is faster and produces same results

============================================================
SUMMARY
============================================================
Tests passed: 3/3
✅ ALL TESTS PASSED

This confirms:

Type changed: list → set ✅
Performance improved: 0.003134s → 0.000062s (50.40x faster) ✅
Functional behavior unchanged: Events correctly forgotten ✅
Serialization works: Sets serialize to JSON arrays and deserialize back correctly ✅

Test 2: Real-World Condensation Workflow

Created a comprehensive test that exercises the actual SDK condensation workflow:

Created 5 MessageEvent objects
Created a Condensation with forgotten_event_ids={event1.id, event3.id} (using set literal)
Built a View from the events
Verified the View correctly:
- Removed events 1 and 3
- Kept events 0, 2, and 4
- Inserted the summary event

Result: ✅ Identical behavior to list-based implementation.

Issues Found

None.

all-hands-bot

Clean performance optimization with proper backward compatibility handling. The list → set change delivers a 50x speedup for condensation application while maintaining identical functional behavior. The backward compatibility test properly verifies old persisted data can still load. LGTM! 🚀

all-hands-bot

✅ QA Report: PASS

The list→set conversion works correctly, maintains backward compatibility, and delivers the claimed performance improvement (143.8x faster lookup).

Does this PR achieve its stated goal?

Yes. The PR successfully changes Condensation.forgotten_event_ids from list[EventID] to set[EventID] and achieves the claimed O(V×F) → O(V) complexity reduction. I verified this by:

Running actual condensation operations — Created Condensation events with set-based forgotten_event_ids, applied them to filter event lists, and confirmed the filtering works correctly
Testing serialization — Verified sets properly serialize to JSON arrays and deserialize back to sets
Confirming backward compatibility — Old list-format persisted data successfully deserializes to sets
Benchmarking performance — Measured 143.8x speedup (49.28ms → 0.34ms for 10K lookups against 1K forgotten IDs)

Phase	Result
Environment Setup	✅ Dependencies installed, no errors
CI Status	✅ All checks passing (build, tests, pre-commit, API validation)
Functional Verification	✅ 5/5 tests passed (creation, serialization, backward compat, view filtering, performance)

Functional Verification

Test 1: Type Verification

Created Condensation with set syntax:

condensation = Condensation(
    forgotten_event_ids={"event1", "event2", "event3"},
    summary="Test summary",
    llm_response_id="test_response_1",
)

Result:

✓ Type: <class 'set'>
✓ Value: {'event2', 'event1', 'event3'}

This confirms forgotten_event_ids is now a set, not a list.

Test 2: Serialization Round-Trip

Serialized Condensation to JSON:

{
  "forgotten_event_ids": ["id3", "id2", "id1"],
  ...
}

Deserialized back:

assert isinstance(deserialized.forgotten_event_ids, set)
assert deserialized.forgotten_event_ids == original.forgotten_event_ids

This confirms Pydantic v2 correctly serializes sets to JSON arrays and deserializes them back to sets.

Test 3: Backward Compatibility

Simulated old persisted format (list):

old_format = {
    "forgotten_event_ids": ["old_id1", "old_id2", "old_id3"],  # List format
    "summary": "Old format summary",
    "llm_response_id": "old_resp",
}

Deserialized:

✓ Type after deserialization: <class 'set'>
✓ Value: {'old_id1', 'old_id2', 'old_id3'}

This confirms old list-format persisted data can be deserialized and automatically converts to a set.

Test 4: View Filtering with Condensation

Created 4 MessageEvents and a Condensation forgetting 2 of them:

forgotten_ids = {msg2.id, msg3.id}
condensation = Condensation(forgotten_event_ids=forgotten_ids, ...)
events = [msg1, msg2, msg3, msg4, condensation]
view = View.from_events(events)

Result:

✓ View contains 2 events (after forgetting)
✓ Remaining: msg1, msg4
✓ Forgotten: msg2, msg3

This confirms View.from_events() correctly uses the set for O(1) membership checks when filtering forgotten events.

Test 5: apply() Method Performance

Created Condensation and applied it to filter events:

filtered_events = condensation.apply(events)

Result:

✓ Original events: 10
✓ Filtered events: 6
✓ Forgotten events: 4
✓ All correct events filtered

This confirms the apply() method works correctly with set-based forgotten_event_ids and now performs O(1) lookups instead of O(F).

Test 6: Performance Benchmark

Setup: 1,000 forgotten event IDs, 10,000 membership checks

Results:

List lookup time: 49.28ms
Set lookup time:  0.34ms
Speedup: 143.8x faster

This confirms the claimed performance improvement. The event.id not in self.forgotten_event_ids check is now O(1) instead of O(F), reducing overall complexity from O(V×F) to O(V).

Issues Found

None.

Merge branch 'main' into fix/3150-condensation-forgotten-event-ids-set

a1fcad6

all-hands-bot reviewed May 8, 2026

View reviewed changes

Comment thread openhands-sdk/openhands/sdk/event/condenser.py

test: add backward compat test for list-format deserialization

9aab8d6

Verify that Condensation events persisted with the old list[EventID] format still deserialize correctly into the new set[EventID] field. Co-authored-by: openhands <openhands@all-hands.dev>

csmith49 requested a review from all-hands-bot May 8, 2026 17:08

all-hands-bot reviewed May 8, 2026

View reviewed changes

all-hands-bot approved these changes May 8, 2026

View reviewed changes

all-hands-bot reviewed May 8, 2026

View reviewed changes

csmith49 added 2 commits May 8, 2026 11:18

Merge branch 'main' into fix/3150-condensation-forgotten-event-ids-set

bc05b99

Merge branch 'main' into fix/3150-condensation-forgotten-event-ids-set

4d288d9

csmith49 requested a review from VascoSch92 May 11, 2026 22:15

VascoSch92 approved these changes May 12, 2026

View reviewed changes

csmith49 merged commit 08454e0 into main May 12, 2026
36 checks passed

csmith49 deleted the fix/3150-condensation-forgotten-event-ids-set branch May 12, 2026 13:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: change Condensation.forgotten_event_ids from list to set#3156

perf: change Condensation.forgotten_event_ids from list to set#3156
csmith49 merged 5 commits into
mainfrom
fix/3150-condensation-forgotten-event-ids-set

csmith49 commented May 8, 2026 •

edited by github-actions Bot

Loading

Uh oh!

github-actions Bot commented May 8, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 8, 2026 •

edited

Loading

Uh oh!

all-hands-bot left a comment

Uh oh!

Uh oh!

github-actions Bot commented May 8, 2026 •

edited

Loading

Uh oh!

all-hands-bot left a comment

Uh oh!

all-hands-bot left a comment

Uh oh!

all-hands-bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

csmith49 commented May 8, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Verification

Uh oh!

github-actions Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python API breakage checks — ✅ PASSED

Uh oh!

github-actions Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

REST API breakage checks (OpenAPI) — ✅ PASSED

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

✅ QA Report: PASS

Does this PR achieve its stated goal?

Test 1: Before/After Performance Comparison

Test 2: Real-World Condensation Workflow

Issues Found

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

✅ QA Report: PASS

Does this PR achieve its stated goal?

Test 1: Type Verification

Test 2: Serialization Round-Trip

Test 3: Backward Compatibility

Test 4: View Filtering with Condensation

Test 5: apply() Method Performance

Test 6: Performance Benchmark

Issues Found

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

csmith49 commented May 8, 2026 •

edited by github-actions Bot

Loading

github-actions Bot commented May 8, 2026 •

edited

Loading

github-actions Bot commented May 8, 2026 •

edited

Loading

github-actions Bot commented May 8, 2026 •

edited

Loading