fix(kb): include provider parameters in cache key#1852
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughModified the cache key generation in ChangesHash Prefix Calculation for Cache Invalidation
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Greptile SummaryThis PR fixes stale Annoy cache reuse by replacing the partial cache-key prefix (only two specific fields) with a full serialization of all embedding search provider parameters, so any parameter change — including Gemini's
|
| Filename | Overview |
|---|---|
| nemoguardrails/kb/kb.py | Replaces partial hash prefix with full JSON serialization of all provider parameters; import json added. Change is deterministic and handles non-serializable types via default=str. |
| tests/test_kb_cache.py | New test verifying all provider parameters appear in the hash input; leaves real .cache directory/file artifacts on the filesystem due to incomplete mocking of I/O in the save path. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[kb.build called] --> B{index_items empty?}
B -- yes --> Z[return]
B -- no --> C[Serialize all provider parameters with sorted keys]
C --> D[Concatenate prefix with all text items]
D --> E[compute hash value]
E --> F[Derive .ann and .esize cache file paths]
F --> G{provider is default AND both cache files exist?}
G -- yes --> H[Load AnnoyIndex from disk - Skip re-embedding]
G -- no --> I[Build index via embedding provider]
I --> J{provider is default?}
J -- yes --> K[Persist .ann and .esize to disk]
J -- no --> L[Done]
K --> L
H --> L
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
tests/test_kb_cache.py:46-52
**Test leaves real filesystem artifacts**
The test patches `compute_hash` but does not patch `os.makedirs`, `os.path.exists`, or `open`. When the provider name is `"default"` and the cache files don't exist, `build()` falls into the save path, which calls `os.makedirs(CACHE_FOLDER, exist_ok=True)` and `open(embedding_size_file, "w")` — creating a real `.cache/` directory and a `.cache/cache-key.esize` file on every test run. Consider using `pytest`'s `tmp_path` fixture to redirect `CACHE_FOLDER`, or patch `open`/`os.makedirs` to keep the test hermetic.
Reviews (3): Last reviewed commit: "test(kb): cover provider parameters in c..." | Re-trigger Greptile
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
a6be550 to
c69efe5
Compare
fda43e2 to
ed24084
Compare
Summary
embedding_engine/embedding_modelchange, such as Geminidimensionality.Fixes #1767
Verification
python3 -m compileall nemoguardrails/kb/kb.pyPR overlap check
1767; no existing open PR covers this issue.Summary by CodeRabbit