Skip to content

perf: bound InMemoryFileStore with LRU eviction#3163

Open
csmith49 wants to merge 1 commit intomainfrom
fix/3146-inmemory-filestore-bounded-cache
Open

perf: bound InMemoryFileStore with LRU eviction#3163
csmith49 wants to merge 1 commit intomainfrom
fix/3146-inmemory-filestore-bounded-cache

Conversation

@csmith49
Copy link
Copy Markdown
Collaborator

@csmith49 csmith49 commented May 8, 2026

Summary

InMemoryFileStore used a plain dict[str, str] as its backing store with no size limit. When used as the non-persistent fallback (no persistence_dir), the dict grew monotonically — every event written stayed in memory forever.

Fix

Replace the backing dict with MemoryLRUCache — the same dual-limit LRU cache already used internally by LocalFileStore. When either the entry-count or memory-size limit is exceeded, the least-recently-used entries are evicted.

Defaults (matching LocalFileStore):

  • max_size=500 entries
  • max_memory=20MB

Both limits are configurable via keyword-only max_size/max_memory constructor params. The existing files= dict argument is preserved for backward compatibility.

Key design choice

Unlike LocalFileStore where eviction just clears the cache (data persists on disk), InMemoryFileStore eviction means data loss. This is acceptable because:

  1. InMemoryFileStore is only used when persistence_dir is not set — the data is already ephemeral
  2. The 500-entry / 20MB defaults are generous for typical SDK usage
  3. Users who need more can pass larger limits

Verification

  • All 494 tests in tests/sdk/io/ + tests/sdk/conversation/ pass
  • All pre-commit hooks pass (ruff, pyright, etc.)
  • Single file changed: openhands-sdk/openhands/sdk/io/memory.py (+15, -4)

Fixes #3146

This PR was created by an AI agent (OpenHands) on behalf of @csmith49.


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22-slim Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:a43e91e-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-a43e91e-python \
  ghcr.io/openhands/agent-server:a43e91e-python

All tags pushed for this build

ghcr.io/openhands/agent-server:a43e91e-golang-amd64
ghcr.io/openhands/agent-server:a43e91eeb10e96a4bb704c454dfe2873dc615a16-golang-amd64
ghcr.io/openhands/agent-server:fix-3146-inmemory-filestore-bounded-cache-golang-amd64
ghcr.io/openhands/agent-server:a43e91e-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:a43e91e-golang-arm64
ghcr.io/openhands/agent-server:a43e91eeb10e96a4bb704c454dfe2873dc615a16-golang-arm64
ghcr.io/openhands/agent-server:fix-3146-inmemory-filestore-bounded-cache-golang-arm64
ghcr.io/openhands/agent-server:a43e91e-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:a43e91e-java-amd64
ghcr.io/openhands/agent-server:a43e91eeb10e96a4bb704c454dfe2873dc615a16-java-amd64
ghcr.io/openhands/agent-server:fix-3146-inmemory-filestore-bounded-cache-java-amd64
ghcr.io/openhands/agent-server:a43e91e-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:a43e91e-java-arm64
ghcr.io/openhands/agent-server:a43e91eeb10e96a4bb704c454dfe2873dc615a16-java-arm64
ghcr.io/openhands/agent-server:fix-3146-inmemory-filestore-bounded-cache-java-arm64
ghcr.io/openhands/agent-server:a43e91e-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:a43e91e-python-amd64
ghcr.io/openhands/agent-server:a43e91eeb10e96a4bb704c454dfe2873dc615a16-python-amd64
ghcr.io/openhands/agent-server:fix-3146-inmemory-filestore-bounded-cache-python-amd64
ghcr.io/openhands/agent-server:a43e91e-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:a43e91e-python-arm64
ghcr.io/openhands/agent-server:a43e91eeb10e96a4bb704c454dfe2873dc615a16-python-arm64
ghcr.io/openhands/agent-server:fix-3146-inmemory-filestore-bounded-cache-python-arm64
ghcr.io/openhands/agent-server:a43e91e-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:a43e91e-golang
ghcr.io/openhands/agent-server:a43e91eeb10e96a4bb704c454dfe2873dc615a16-golang
ghcr.io/openhands/agent-server:fix-3146-inmemory-filestore-bounded-cache-golang
ghcr.io/openhands/agent-server:a43e91e-golang_tag_1.21-bookworm
ghcr.io/openhands/agent-server:a43e91e-java
ghcr.io/openhands/agent-server:a43e91eeb10e96a4bb704c454dfe2873dc615a16-java
ghcr.io/openhands/agent-server:fix-3146-inmemory-filestore-bounded-cache-java
ghcr.io/openhands/agent-server:a43e91e-eclipse-temurin_tag_17-jdk
ghcr.io/openhands/agent-server:a43e91e-python
ghcr.io/openhands/agent-server:a43e91eeb10e96a4bb704c454dfe2873dc615a16-python
ghcr.io/openhands/agent-server:fix-3146-inmemory-filestore-bounded-cache-python
ghcr.io/openhands/agent-server:a43e91e-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim

About Multi-Architecture Support

  • Each variant tag (e.g., a43e91e-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., a43e91e-python-amd64) are also available if needed

InMemoryFileStore used a plain dict that grew without bound.
Replace the backing dict with MemoryLRUCache (the same LRU
implementation already used by LocalFileStore) so entries are
evicted when either the entry-count or memory-size limit is
exceeded.

Defaults match LocalFileStore: 500 entries, 20 MB.  Both limits
are configurable via keyword-only max_size / max_memory params.
The files= dict constructor argument is preserved for backward
compatibility.

Fixes #3146

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

Python API breakage checks — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

REST API breakage checks (OpenAPI) — ✅ PASSED

Result:PASSED

Action log

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable - Pragmatic fix for real memory leak

Worth merging - This PR elegantly solves a real production issue (monotonic memory growth) by reusing the battle-tested MemoryLRUCache component already used in LocalFileStore. The change is small, focused, and backward compatible.

I've left two suggestions for future improvements around test coverage and documentation, but neither blocks merging.


class InMemoryFileStore(FileStore):
files: dict[str, str]
files: MemoryLRUCache
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: This type annotation change from dict[str, str] to MemoryLRUCache is technically a breaking change for type checkers.

Code that directly accesses store.files and expects a dict would now fail type checking:

store = InMemoryFileStore()
d: dict[str, str] = store.files  # Type error!

However, this is acceptable because:

  1. MemoryLRUCache extends LRUCache (from cachetools), which implements the full MutableMapping protocol - runtime behavior is compatible
  2. Direct access to .files (not prefixed with _) is uncommon
  3. This fixes a real bug (memory leak)

No action needed, just documenting the trade-off.

max_size: int = _DEFAULT_MAX_SIZE,
max_memory: int = _DEFAULT_MAX_MEMORY,
) -> None:
self.files = MemoryLRUCache(max_memory=max_memory, max_size=max_size)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: Consider adding tests that verify InMemoryFileStore LRU behavior.

The existing tests/sdk/io/test_filestore_cache.py has comprehensive tests for LocalFileStore + MemoryLRUCache (e.g., test_cache_lru_eviction, test_cache_memory_limit). Consider adding similar tests for InMemoryFileStore to verify:

  1. Eviction happens when max_size is exceeded
  2. Eviction happens when max_memory is exceeded
  3. Custom limits work correctly via constructor params
  4. Backward compat: passing files= dict works

The fact that all 494 existing tests pass proves backward compatibility, and MemoryLRUCache itself is well-tested. But explicit tests for the new behavior would make the fix more robust and serve as documentation.

Not blocking - can be added in a follow-up PR.

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

The PR successfully implements LRU eviction for InMemoryFileStore and fixes the unbounded memory growth issue.

Does this PR achieve its stated goal?

Yes. The PR set out to fix issue #3146 where InMemoryFileStore used a plain dict with no size limit, causing monotonic memory growth. I verified that:

  1. The baseline (main branch) retains all 1000 entries with no eviction
  2. With the fix, the store enforces a 500-entry limit and evicts the oldest 500 entries
  3. Both entry-count and memory-size limits work correctly
  4. The fix is backward compatible with the existing files= dict argument
Phase Result
Environment Setup ✅ Dependencies installed, PR branch checked out
CI Status ✅ Most checks passing (some in progress)
Functional Verification ✅ All 5 verification tests passed
Functional Verification

Before/After Comparison: Unbounded Growth Fix

Step 1 — Reproduce the problem (main branch without fix):

Checked out main branch and ran verification script:

git checkout main
uv run python test_baseline_growth.py

Output:

Backing store type: <class 'dict'>
Backing store type name: dict
❌ No maxsize attribute (plain dict - unbounded!)
❌ No max_memory attribute (plain dict - unbounded!)

Adding 1000 entries...
Entries in store: 1000
❌ PROBLEM CONFIRMED: All 1000 entries retained (unbounded growth!)
   This is the bug that PR #3163 fixes.

Interpretation: The baseline uses a plain dict with no eviction mechanism. All 1000 entries are retained in memory, confirming the unbounded growth bug.

Step 2 — Apply the PR's changes:

Checked out PR branch:

git checkout pr-3163

Step 3 — Re-run with the fix in place:

Ran the same verification script:

uv run python test_baseline_growth.py

Output:

Backing store type: <class 'openhands.sdk.io.cache.MemoryLRUCache'>
Backing store type name: MemoryLRUCache
✅ Has maxsize attribute: 500
✅ Has max_memory attribute: 20971520 bytes (20.0 MB)

Adding 1000 entries (default limit: 500)...
Entries in store: 500
✅ FIX CONFIRMED: Only 500 entries retained (LRU eviction working!)
   The unbounded growth bug is fixed.

Interpretation: With the fix, the backing store is now a MemoryLRUCache with maxsize=500 and max_memory=20MB. When 1000 entries are added, the LRU cache evicts the oldest 500 entries and retains only the newest 500, confirming the fix works.


Comprehensive Verification Tests

Ran comprehensive test suite covering all aspects of the change:

uv run python test_lru_eviction.py

All 5 tests passed:

Test 1: Entry count eviction (max_size limit)

  • Added 5 entries with max_size=3
  • Verified oldest 2 entries (file_0, file_1) were evicted
  • Verified newest 3 entries (file_2, file_3, file_4) were retained
  • ✅ PASS: LRU eviction working correctly

Test 2: Memory limit eviction (max_memory limit)

  • Added 5 entries of ~500 bytes each with max_memory=1024
  • Result: 2 entries retained, using 1000/1024 bytes
  • ✅ PASS: Memory-based eviction working correctly

Test 3: Configurable limits

  • Custom max_size=10: verified ≤10 entries retained
  • Custom max_memory=2048: verified ≤2048 bytes used
  • ✅ PASS: Custom limits respected

Test 4: Backward compatibility (files= argument)

  • Created store with files={"file1.txt": "content1", "file2.txt": "content2"}
  • Verified both files loaded correctly
  • ✅ PASS: Backward compatibility preserved

Test 5: Default limits (500 entries, 20MB)

  • Verified maxsize=500
  • Verified max_memory=20971520 (20 MB)
  • ✅ PASS: Default limits match documentation

Full test output:

======================================================================
SUMMARY
======================================================================
✅ PASS: Entry count eviction
✅ PASS: Memory limit eviction
✅ PASS: Configurable limits
✅ PASS: Backward compatibility
✅ PASS: Default limits

Total: 5/5 tests passed

🎉 All tests passed! The LRU eviction is working correctly.

Issues Found

None.

Copy link
Copy Markdown
Contributor

@VascoSch92 VascoSch92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a small nit

Comment on lines +14 to +15
_DEFAULT_MAX_SIZE = 500
_DEFAULT_MAX_MEMORY = 20 * 1024 * 1024 # 20 MB
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use Final from typing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: InMemoryFileStore LRU cache is unbounded per conversation

3 participants