perf: bound InMemoryFileStore with LRU eviction#3163
Conversation
InMemoryFileStore used a plain dict that grew without bound. Replace the backing dict with MemoryLRUCache (the same LRU implementation already used by LocalFileStore) so entries are evicted when either the entry-count or memory-size limit is exceeded. Defaults match LocalFileStore: 500 entries, 20 MB. Both limits are configurable via keyword-only max_size / max_memory params. The files= dict constructor argument is preserved for backward compatibility. Fixes #3146 Co-authored-by: openhands <openhands@all-hands.dev>
Python API breakage checks — ✅ PASSEDResult: ✅ PASSED |
REST API breakage checks (OpenAPI) — ✅ PASSEDResult: ✅ PASSED |
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Acceptable - Pragmatic fix for real memory leak
Worth merging - This PR elegantly solves a real production issue (monotonic memory growth) by reusing the battle-tested MemoryLRUCache component already used in LocalFileStore. The change is small, focused, and backward compatible.
I've left two suggestions for future improvements around test coverage and documentation, but neither blocks merging.
|
|
||
| class InMemoryFileStore(FileStore): | ||
| files: dict[str, str] | ||
| files: MemoryLRUCache |
There was a problem hiding this comment.
🟡 Suggestion: This type annotation change from dict[str, str] to MemoryLRUCache is technically a breaking change for type checkers.
Code that directly accesses store.files and expects a dict would now fail type checking:
store = InMemoryFileStore()
d: dict[str, str] = store.files # Type error!However, this is acceptable because:
MemoryLRUCacheextendsLRUCache(fromcachetools), which implements the fullMutableMappingprotocol - runtime behavior is compatible- Direct access to
.files(not prefixed with_) is uncommon - This fixes a real bug (memory leak)
No action needed, just documenting the trade-off.
| max_size: int = _DEFAULT_MAX_SIZE, | ||
| max_memory: int = _DEFAULT_MAX_MEMORY, | ||
| ) -> None: | ||
| self.files = MemoryLRUCache(max_memory=max_memory, max_size=max_size) |
There was a problem hiding this comment.
🟡 Suggestion: Consider adding tests that verify InMemoryFileStore LRU behavior.
The existing tests/sdk/io/test_filestore_cache.py has comprehensive tests for LocalFileStore + MemoryLRUCache (e.g., test_cache_lru_eviction, test_cache_memory_limit). Consider adding similar tests for InMemoryFileStore to verify:
- Eviction happens when
max_sizeis exceeded - Eviction happens when
max_memoryis exceeded - Custom limits work correctly via constructor params
- Backward compat: passing
files=dict works
The fact that all 494 existing tests pass proves backward compatibility, and MemoryLRUCache itself is well-tested. But explicit tests for the new behavior would make the fix more robust and serve as documentation.
Not blocking - can be added in a follow-up PR.
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS
The PR successfully implements LRU eviction for InMemoryFileStore and fixes the unbounded memory growth issue.
Does this PR achieve its stated goal?
Yes. The PR set out to fix issue #3146 where InMemoryFileStore used a plain dict with no size limit, causing monotonic memory growth. I verified that:
- The baseline (main branch) retains all 1000 entries with no eviction
- With the fix, the store enforces a 500-entry limit and evicts the oldest 500 entries
- Both entry-count and memory-size limits work correctly
- The fix is backward compatible with the existing
files=dict argument
| Phase | Result |
|---|---|
| Environment Setup | ✅ Dependencies installed, PR branch checked out |
| CI Status | ✅ Most checks passing (some in progress) |
| Functional Verification | ✅ All 5 verification tests passed |
Functional Verification
Before/After Comparison: Unbounded Growth Fix
Step 1 — Reproduce the problem (main branch without fix):
Checked out main branch and ran verification script:
git checkout main
uv run python test_baseline_growth.pyOutput:
Backing store type: <class 'dict'>
Backing store type name: dict
❌ No maxsize attribute (plain dict - unbounded!)
❌ No max_memory attribute (plain dict - unbounded!)
Adding 1000 entries...
Entries in store: 1000
❌ PROBLEM CONFIRMED: All 1000 entries retained (unbounded growth!)
This is the bug that PR #3163 fixes.
Interpretation: The baseline uses a plain dict with no eviction mechanism. All 1000 entries are retained in memory, confirming the unbounded growth bug.
Step 2 — Apply the PR's changes:
Checked out PR branch:
git checkout pr-3163Step 3 — Re-run with the fix in place:
Ran the same verification script:
uv run python test_baseline_growth.pyOutput:
Backing store type: <class 'openhands.sdk.io.cache.MemoryLRUCache'>
Backing store type name: MemoryLRUCache
✅ Has maxsize attribute: 500
✅ Has max_memory attribute: 20971520 bytes (20.0 MB)
Adding 1000 entries (default limit: 500)...
Entries in store: 500
✅ FIX CONFIRMED: Only 500 entries retained (LRU eviction working!)
The unbounded growth bug is fixed.
Interpretation: With the fix, the backing store is now a MemoryLRUCache with maxsize=500 and max_memory=20MB. When 1000 entries are added, the LRU cache evicts the oldest 500 entries and retains only the newest 500, confirming the fix works.
Comprehensive Verification Tests
Ran comprehensive test suite covering all aspects of the change:
uv run python test_lru_eviction.pyAll 5 tests passed:
Test 1: Entry count eviction (max_size limit)
- Added 5 entries with
max_size=3 - Verified oldest 2 entries (file_0, file_1) were evicted
- Verified newest 3 entries (file_2, file_3, file_4) were retained
- ✅ PASS: LRU eviction working correctly
Test 2: Memory limit eviction (max_memory limit)
- Added 5 entries of ~500 bytes each with
max_memory=1024 - Result: 2 entries retained, using 1000/1024 bytes
- ✅ PASS: Memory-based eviction working correctly
Test 3: Configurable limits
- Custom
max_size=10: verified ≤10 entries retained - Custom
max_memory=2048: verified ≤2048 bytes used - ✅ PASS: Custom limits respected
Test 4: Backward compatibility (files= argument)
- Created store with
files={"file1.txt": "content1", "file2.txt": "content2"} - Verified both files loaded correctly
- ✅ PASS: Backward compatibility preserved
Test 5: Default limits (500 entries, 20MB)
- Verified
maxsize=500 - Verified
max_memory=20971520(20 MB) - ✅ PASS: Default limits match documentation
Full test output:
======================================================================
SUMMARY
======================================================================
✅ PASS: Entry count eviction
✅ PASS: Memory limit eviction
✅ PASS: Configurable limits
✅ PASS: Backward compatibility
✅ PASS: Default limits
Total: 5/5 tests passed
🎉 All tests passed! The LRU eviction is working correctly.
Issues Found
None.
| _DEFAULT_MAX_SIZE = 500 | ||
| _DEFAULT_MAX_MEMORY = 20 * 1024 * 1024 # 20 MB |
Summary
InMemoryFileStoreused a plaindict[str, str]as its backing store with no size limit. When used as the non-persistent fallback (nopersistence_dir), the dict grew monotonically — every event written stayed in memory forever.Fix
Replace the backing
dictwithMemoryLRUCache— the same dual-limit LRU cache already used internally byLocalFileStore. When either the entry-count or memory-size limit is exceeded, the least-recently-used entries are evicted.Defaults (matching
LocalFileStore):max_size=500entriesmax_memory=20MBBoth limits are configurable via keyword-only
max_size/max_memoryconstructor params. The existingfiles=dict argument is preserved for backward compatibility.Key design choice
Unlike
LocalFileStorewhere eviction just clears the cache (data persists on disk),InMemoryFileStoreeviction means data loss. This is acceptable because:InMemoryFileStoreis only used whenpersistence_diris not set — the data is already ephemeralVerification
tests/sdk/io/+tests/sdk/conversation/passopenhands-sdk/openhands/sdk/io/memory.py(+15, -4)Fixes #3146
This PR was created by an AI agent (OpenHands) on behalf of @csmith49.
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22-slimgolang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:a43e91e-pythonRun
All tags pushed for this build
About Multi-Architecture Support
a43e91e-python) is a multi-arch manifest supporting both amd64 and arm64a43e91e-python-amd64) are also available if needed