perf: batch ConversationState autosaves inside context manager#3165
perf: batch ConversationState autosaves inside context manager#3165csmith49 wants to merge 3 commits into
Conversation
ConversationState.__setattr__ triggered a full JSON serialization and
synchronous file write on every public-field mutation. During a single
agent step multiple fields are updated (execution_status, stats, etc.),
producing 2-4 redundant writes per step.
Add save-depth tracking to the existing context-manager protocol:
- __enter__ increments _save_depth; __setattr__ sets _dirty instead
of saving immediately while _save_depth > 0.
- __exit__ decrements _save_depth; when it reaches 0 and _dirty is
set, one batched save is flushed before releasing the lock.
- Outside any context-manager block, saves remain immediate for
backward compatibility.
State-change callbacks still fire eagerly on each mutation so
WebSocket clients see intermediate updates without waiting for the
block to exit.
Fixes #3145
Co-authored-by: openhands <openhands@all-hands.dev>
Python API breakage checks — ✅ PASSEDResult: ✅ PASSED |
REST API breakage checks (OpenAPI) — ✅ PASSEDResult: ✅ PASSED |
all-hands-bot
left a comment
There was a problem hiding this comment.
🟢 Good taste - Elegant I/O batching solution that eliminates redundant saves (2-4 per step → 1 per block) without breaking backward compatibility. The depth-counter pattern handles nesting correctly, callbacks still fire eagerly, and the lock is properly managed. Clean implementation with solid test coverage. LGTM! ✅
[RISK ASSESSMENT]
Pure I/O optimization with no agent behavior changes. Maintains backward compatibility (mutations outside context managers still save immediately), preserves eager callback firing, and has comprehensive test coverage proving both the optimization and backward compat work correctly.
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS
The PR successfully reduces I/O operations by batching ConversationState autosaves, achieving 67% reduction in save operations (from 3→1 for typical agent steps).
Does this PR achieve its stated goal?
Yes. The PR delivers exactly what it promises: multiple field mutations inside with state: now produce a single batched save at exit instead of 2-4 individual saves. Backward compatibility is preserved (mutations outside context managers still save immediately), callbacks fire eagerly (WebSocket clients see intermediate updates), and nested contexts work correctly via depth tracking.
| Phase | Result |
|---|---|
| Environment Setup | ✅ make build completed successfully |
| CI Status | ✅ All functional checks passing (sdk-tests, cross-tests, windows-tests, agent-server-tests, tools-tests, workspace-tests, pre-commit) — only Docker builds pending |
| Functional Verification | ✅ Save batching verified with before/after comparison, nested contexts tested, callbacks confirmed eager |
Functional Verification
Test 1: Save Batching — Before/After Comparison
Step 1 — Establish baseline (main branch without fix):
Checked out main and ran test with 3 field mutations inside with state: block:
============================================================
TEST 1: Multiple mutations INSIDE 'with state:' block
============================================================
Setting execution_status...
→ Save #1 triggered
Setting max_iterations...
→ Save #2 triggered
Setting stuck_detection...
→ Save #3 triggered
Exiting context manager...
✓ Total saves inside 'with' block: 3
This confirms the problem: 3 separate saves (one per field mutation) inside the context manager on main branch.
Step 2 — Apply the PR's changes:
Checked out branch fix/3145-state-autosave-debounce.
Step 3 — Re-run with the fix in place:
Ran the same test on the PR branch:
============================================================
TEST 1: Multiple mutations INSIDE 'with state:' block
============================================================
Setting execution_status...
Setting max_iterations...
Setting stuck_detection...
Exiting context manager...
→ Save #1 triggered
✓ Total saves inside 'with' block: 1
This confirms the fix works: 1 batched save at context manager exit instead of 3 separate saves. 67% reduction in I/O operations.
Test 2: Backward Compatibility
Verified mutations outside with state: still save immediately:
============================================================
TEST 2: Single mutation OUTSIDE 'with state:' block
============================================================
Setting max_iterations...
→ Save #1 triggered
✓ Total saves outside 'with' block: 1
✅ Result: Immediate save behavior preserved for non-context-manager usage.
Test 3: Nested Context Managers
Tested depth tracking with nested with state: blocks:
============================================================
TEST: Nested context managers
============================================================
Entering outer context...
Setting execution_status in outer context...
Entering inner context...
Setting max_iterations in inner context...
Setting stuck_detection in inner context...
Exiting inner context...
After inner context, saves so far: 0
Setting max_iterations in outer context...
Exiting outer context...
→ Save #1 triggered
✓ Total saves with nested contexts: 1
✅ Result: Depth counter works correctly — all mutations batched until outermost context exits.
Test 4: Eager Callback Execution
Verified state change callbacks fire immediately (not batched):
============================================================
TEST: Callbacks fire eagerly (not batched)
============================================================
Mutations inside 'with state:' block:
Setting execution_status...
→ Callback #1 fired
Setting max_iterations...
→ Callback #2 fired
Setting stuck_detection...
→ Callback #3 fired
Exiting context manager...
→ Save #1 triggered
✓ Callbacks fired: 3 (expected: 3, one per mutation)
✓ Saves executed: 1 (expected: 1, batched at exit)
✅ Result: Callbacks fire eagerly (important for WebSocket clients to see intermediate updates) while saves are batched (reducing I/O).
Test 5: Existing Test Suite
Ran all state serialization tests:
uv run pytest tests/sdk/conversation/local/test_state_serialization.py -v✅ Result: All 31 tests pass, including the new test_context_manager_batches_saves.
Issues Found
None.
Verdict: This PR achieves its performance optimization goal without breaking existing functionality. The implementation is clean, well-tested, and preserves backward compatibility. Ready to merge.
Summary
ConversationState.__setattr__triggered a full JSON serialization + synchronous file write on every public-field mutation. During a single agent step, multiple fields are updated (execution_status,stats,max_iterations, etc.), producing 2–4 redundant writes per step — each serializing the entire state including large nested objects likeagent,stats, andworkspace.Fix
Extend the existing
with state:context-manager protocol (currently just a FIFO lock) to also batch saves:with state:_dirty = True; one save on__exit__with state:Implementation:
__enter__increments_save_depth__setattr__checks_save_depth > 0→ sets_dirtyinstead of saving__exit__decrements_save_depth; when it reaches 0 and_dirty, flush one save, then release lockI/O reduction per step
model_dump_json()+open()/write()per stepwithblockVerification
test_context_manager_batches_saves— proves 3 mutations → 1 save insidewith, and immediate save outsideFixes #3145
This PR was created by an AI agent (OpenHands) on behalf of @csmith49.
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22-slimgolang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:4a3d894-pythonRun
All tags pushed for this build
About Multi-Architecture Support
4a3d894-python) is a multi-arch manifest supporting both amd64 and arm644a3d894-python-amd64) are also available if needed