fix: make deduplicate_chunks and build_context handle SearchResultItem objects#1739
fix: make deduplicate_chunks and build_context handle SearchResultItem objects#1739MervinPraison wants to merge 5 commits into
Conversation
…m objects - Add _extract_value helper to handle both dict and SearchResultItem objects - Update type hints to accept Union[Dict[str, Any], Any] in context functions - Maintain full backward compatibility with existing dict-based usage - Fix AttributeError when SearchResultItem objects are passed to .get() method Fixes #1732 Co-authored-by: praisonai-triage-agent[bot] <praisonai-triage-agent[bot]@users.noreply.github.com>
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
📝 WalkthroughWalkthroughThe RAG context module now accepts both dict and object-like search results. New helpers ChangesMixed-format search results support
Sequence DiagramsequenceDiagram
participant Client
participant Build as build_context
participant Dedup as deduplicate_chunks
participant Extract as _extract_value
Client->>Build: pass List[ResultItem]
Build->>Extract: extract text/memory for each item
Extract-->>Build: text value
Build->>Dedup: (optionally) pass extracted items for deduplication
Dedup->>Extract: extract text/metadata/source as needed
Dedup-->>Build: filtered/unique items (original objects)
Build-->>Client: return composed context and used_results
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@copilot Do a thorough review of this PR. Read ALL existing reviewer comments above from Qodo, Coderabbit, and Gemini first — incorporate their findings. Review areas:
|
There was a problem hiding this comment.
Code Review
This pull request updates the RAG context module to support both dictionary and object-based search results, such as SearchResultItem. Key changes include the introduction of a _extract_value helper function and updated type signatures across the deduplicate_chunks and build_context functions. Feedback focused on removing redundant isinstance checks for metadata extraction, as the variable is already validated as a dictionary earlier in the logic.
| if not isinstance(metadata, dict): | ||
| metadata = {} | ||
|
|
||
| source = metadata.get("source", "") if isinstance(metadata, dict) else "" |
There was a problem hiding this comment.
The conditional check if isinstance(metadata, dict) is redundant here because the logic on lines 83-85 already ensures that metadata is a dictionary. Simplifying this improves readability.
| source = metadata.get("source", "") if isinstance(metadata, dict) else "" | |
| source = metadata.get("source", "") |
| source = metadata.get("source", "") if isinstance(metadata, dict) else "" | ||
| filename = metadata.get("filename", "") if isinstance(metadata, dict) else "" |
There was a problem hiding this comment.
The conditional checks if isinstance(metadata, dict) are redundant here because the logic on lines 180-182 already ensures that metadata is a dictionary. Simplifying these lines improves code clarity.
| source = metadata.get("source", "") if isinstance(metadata, dict) else "" | |
| filename = metadata.get("filename", "") if isinstance(metadata, dict) else "" | |
| source = metadata.get("source", "") | |
| filename = metadata.get("filename", "") |
There was a problem hiding this comment.
Pull request overview
This PR addresses a runtime type mismatch in the RAG context builder by allowing deduplicate_chunks() and build_context() to consume both legacy dict-shaped retrieval results and object-shaped results (e.g., SearchResultItem) without raising AttributeError on .get() calls.
Changes:
- Added
_extract_value()helper to read fields from either dicts or objects viadict.get()/getattr(). - Updated
deduplicate_chunks()andbuild_context()to use_extract_value()and widened their type hints accordingly. - Updated
DefaultContextBuilder.build()docstring/type hints to match the more flexible input shape.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def deduplicate_chunks( | ||
| results: List[Dict[str, Any]], | ||
| results: List[Union[Dict[str, Any], Any]], | ||
| similarity_threshold: float = 0.9, | ||
| ) -> List[Dict[str, Any]]: | ||
| ) -> List[Union[Dict[str, Any], Any]]: |
| if not isinstance(metadata, dict): | ||
| metadata = {} | ||
|
|
||
| source = metadata.get("source", "") if isinstance(metadata, dict) else "" |
| source = metadata.get("source", "") if isinstance(metadata, dict) else "" | ||
| filename = metadata.get("filename", "") if isinstance(metadata, dict) else "" |
| def _extract_value(item: Union[Dict[str, Any], Any], key: str, default: Any = None) -> Any: | ||
| """ | ||
| Extract value from either dict or object (e.g., SearchResultItem). | ||
|
|
||
| Args: |
|
@claude You are the FINAL architecture reviewer. If the branch is under MervinPraison/PraisonAI (not a fork), you are able to make modifications to this branch and push directly. SCOPE: Focus ONLY on Python packages (praisonaiagents, praisonai). Do NOT modify praisonai-rust or praisonai-ts. Read ALL comments above from Gemini, Qodo, CodeRabbit, and Copilot carefully before responding. Phase 1: Review per AGENTS.md
Phase 2: FIX Valid Issues Phase 3: Final Verdict |
Greptile SummaryThis PR fixes a type mismatch where
Confidence Score: 5/5Safe to merge; the change is additive and backward-compatible, and all existing dict-based callers are unaffected. The two new helper functions correctly handle the common cases (absent key, None value, real top-level attribute), and the downstream fallback chain gracefully absorbs the one remaining edge case where _extract_metadata_value returns an empty string instead of falling back to the top-level attribute. No data loss or security concern is introduced. context.py — specifically the _extract_metadata_value guard logic around line 55–58 Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Search results\nList of dicts or SearchResultItem] --> B{deduplicate?}
B -- yes --> C[deduplicate_chunks]
B -- no --> E[build_context]
C --> D1[_extract_value: text]
C --> D2[_extract_value: metadata]
D2 --> D3[ensure metadata is dict]
D3 --> D5[_extract_metadata_value: source\nfallback to top-level attribute]
D1 --> D6[_chunk_hash: text + source]
D5 --> D6
D6 --> D7{seen hash?}
D7 -- yes --> D8[skip]
D7 -- no --> D9[keep result]
D9 --> E
E --> F1[_extract_value: text]
E --> F2[_extract_metadata_value: source and filename\nfallback to top-level]
F1 --> F3[source_label = filename or source or Source N]
F2 --> F3
F3 --> G[Token budget check]
G --> H[context string + used_results]
Reviews (3): Last reviewed commit: "test: add comprehensive SearchResultItem..." | Re-trigger Greptile |
|
Claude finished @MervinPraison's task —— View job Final Architecture Review ✅ APPROVEDPhase 1: Architecture Review per AGENTS.md
Phase 2: Fix Valid Issues
Phase 3: Final Verdict
🎯 Final Architectural Assessment✅ APPROVED - All Issues ResolvedSummary of Changes Made:
Reviewer Feedback Addressed:
Architecture Compliance:
Files Modified:
🚀 Recommendation: MERGEThis PR successfully fixes the reported AttributeError while maintaining architectural integrity and addressing all valid reviewer concerns. The solution is clean, well-tested, and follows PraisonAI SDK principles. |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
src/praisonai-agents/praisonaiagents/rag/context.py (2)
180-185: ⚡ Quick winRemove redundant
isinstance(metadata, dict)checks.Similar to
deduplicate_chunks, lines 180-183 normalizemetadatato an empty dict, making it guaranteed to be adict. The subsequent checks on lines 181-182 and 184-185 are redundant:
- Lines 181-182 check
isinstance(metadata, dict)after potentially setting it to{}.- Lines 184-185 check
isinstance(metadata, dict)even though lines 180-183 already ensure this.♻️ Simplify the metadata extraction logic
# CRITICAL: Handle metadata=None from mem0 - ensure always dict metadata = _extract_value(result, "metadata") or {} if not isinstance(metadata, dict): metadata = {} -source = metadata.get("source", "") if isinstance(metadata, dict) else "" -filename = metadata.get("filename", "") if isinstance(metadata, dict) else "" +source = metadata.get("source", "") +filename = metadata.get("filename", "") source_label = filename or source or f"Source {i + 1}"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/praisonai-agents/praisonaiagents/rag/context.py` around lines 180 - 185, The metadata normalization already ensures metadata is a dict (metadata = _extract_value(result, "metadata") or {}; if not isinstance(metadata, dict): metadata = {}), so remove the redundant isinstance(metadata, dict) guards when extracting source and filename; directly use metadata.get("source", "") and metadata.get("filename", "") in the same block (referencing the metadata variable and the _extract_value call), mirroring the simplification used in deduplicate_chunks.
83-87: ⚡ Quick winRemove redundant
isinstance(metadata, dict)checks.Lines 83-86 normalize
metadatato an empty dict if it'sNoneor not a dict. After this normalization,metadatais guaranteed to be adict. Therefore:
- Lines 84-85 redundantly check
isinstance(metadata, dict)immediately after setting it to{}on line 85.- Line 87 checks
isinstance(metadata, dict)even though lines 83-86 already ensure this.According to the context snippet from
knowledge/models.py,SearchResultItem.metadatais enforced to always be a dict (neverNone) in__post_init__. The defensive checks are only needed for legacy dict-format results.♻️ Simplify the metadata extraction logic
# CRITICAL: Handle metadata=None from mem0 - ensure always dict metadata = _extract_value(result, "metadata") or {} if not isinstance(metadata, dict): metadata = {} -source = metadata.get("source", "") if isinstance(metadata, dict) else "" +source = metadata.get("source", "")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/praisonai-agents/praisonaiagents/rag/context.py` around lines 83 - 87, The metadata normalization currently does redundant isinstance checks: after setting metadata = _extract_value(result, "metadata") or {} the value is guaranteed to be a dict, so remove the extra isinstance(metadata, dict) branches and simplify to: keep metadata = _extract_value(result, "metadata") or {} and then compute source = metadata.get("source", "") directly; update any similar redundant checks in the surrounding code (references: variable metadata, function _extract_value, and the source assignment) to rely on the normalized dict.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/praisonai-agents/praisonaiagents/rag/context.py`:
- Around line 56-58: Tighten the overly-broad type hint Union[Dict[str, Any],
Any] by introducing a protocol (e.g., SearchResultProtocol) that declares the
expected attributes/methods (text, metadata: Dict[str, Any], score: float) and
replace the parameter/return types to use List[SearchResultProtocol] in the
function signature (the parameters named results and similarity_threshold and
the function return). Define SearchResultProtocol in this module (or a shared
typing module) so plain dicts that match the protocol are accepted via
structural typing and update imports/annotations accordingly.
- Around line 22-38: The helper _extract_value currently swallows missing
fields; change it to fail fast by validating inputs and raising a clear
exception when a required field is absent (e.g., raise ValueError with a message
that includes which key is missing and contextual hints like agent name/tool
name/session id if available). Replace the loose Union[Dict[str, Any], Any]
signature by defining a SearchResultProtocol (typing.Protocol) that declares
expected attributes such as text, metadata, score, and accept that protocol type
in function signatures (use isinstance checks or hasattr where necessary), and
update callers to pass objects conforming to SearchResultProtocol; also add an
optional on_error hook parameter or callsite to allow interception of the raised
error before propagation. Ensure error messages include remediation advice and
the unique identifier of the offending item to aid debugging.
---
Nitpick comments:
In `@src/praisonai-agents/praisonaiagents/rag/context.py`:
- Around line 180-185: The metadata normalization already ensures metadata is a
dict (metadata = _extract_value(result, "metadata") or {}; if not
isinstance(metadata, dict): metadata = {}), so remove the redundant
isinstance(metadata, dict) guards when extracting source and filename; directly
use metadata.get("source", "") and metadata.get("filename", "") in the same
block (referencing the metadata variable and the _extract_value call), mirroring
the simplification used in deduplicate_chunks.
- Around line 83-87: The metadata normalization currently does redundant
isinstance checks: after setting metadata = _extract_value(result, "metadata")
or {} the value is guaranteed to be a dict, so remove the extra
isinstance(metadata, dict) branches and simplify to: keep metadata =
_extract_value(result, "metadata") or {} and then compute source =
metadata.get("source", "") directly; update any similar redundant checks in the
surrounding code (references: variable metadata, function _extract_value, and
the source assignment) to rely on the normalized dict.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c38fa401-119d-4512-89a9-cadeb4a116d0
📒 Files selected for processing (1)
src/praisonai-agents/praisonaiagents/rag/context.py
| def _extract_value(item: Union[Dict[str, Any], Any], key: str, default: Any = None) -> Any: | ||
| """ | ||
| Extract value from either dict or object (e.g., SearchResultItem). | ||
|
|
||
| Args: | ||
| item: Dictionary or object with attributes | ||
| key: Key/attribute name to extract | ||
| default: Default value if key not found | ||
|
|
||
| Returns: | ||
| Extracted value or default | ||
| """ | ||
| if isinstance(item, dict): | ||
| return item.get(key, default) | ||
| else: | ||
| return getattr(item, key, default) | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift
Consider fail-fast error handling for invalid result items.
The _extract_value helper silently returns defaults when keys or attributes are missing. Per coding guidelines, core SDK should fail fast with clear error messages. If a result item lacks required fields like text or metadata, this may indicate a contract violation that should be surfaced rather than silently handled.
Additionally, the coding guidelines specify protocol-driven design for extension points. Instead of accepting Union[Dict[str, Any], Any] (which is effectively just Any), consider defining a SearchResultProtocol using typing.Protocol that specifies the expected interface (e.g., text, metadata, score attributes). This would provide better type safety and clearer contracts.
As per coding guidelines: "Core SDK (praisonaiagents) must use protocol-driven design with typing.Protocol for all extension points, not heavy implementations" and "Error handling: Fail fast with clear error messages; include remediation hints in exceptions; propagate context (agent name, tool name, session ID); provide hook points for error interception (on_error events)".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/praisonai-agents/praisonaiagents/rag/context.py` around lines 22 - 38,
The helper _extract_value currently swallows missing fields; change it to fail
fast by validating inputs and raising a clear exception when a required field is
absent (e.g., raise ValueError with a message that includes which key is missing
and contextual hints like agent name/tool name/session id if available). Replace
the loose Union[Dict[str, Any], Any] signature by defining a
SearchResultProtocol (typing.Protocol) that declares expected attributes such as
text, metadata, score, and accept that protocol type in function signatures (use
isinstance checks or hasattr where necessary), and update callers to pass
objects conforming to SearchResultProtocol; also add an optional on_error hook
parameter or callsite to allow interception of the raised error before
propagation. Ensure error messages include remediation advice and the unique
identifier of the offending item to aid debugging.
Agent-Logs-Url: https://github.com/MervinPraison/PraisonAI/sessions/fbaed72c-c0f6-4426-8cb4-09c3542f499f Co-authored-by: MervinPraison <454862+MervinPraison@users.noreply.github.com>
Agent-Logs-Url: https://github.com/MervinPraison/PraisonAI/sessions/fbaed72c-c0f6-4426-8cb4-09c3542f499f Co-authored-by: MervinPraison <454862+MervinPraison@users.noreply.github.com>
Agent-Logs-Url: https://github.com/MervinPraison/PraisonAI/sessions/fbaed72c-c0f6-4426-8cb4-09c3542f499f Co-authored-by: MervinPraison <454862+MervinPraison@users.noreply.github.com>
Completed a full review and incorporated all actionable bot feedback. Addressed in commits
Security/performance checks: no secrets, eval/exec, or heavy imports introduced; hot path remains lightweight. Reviewer findings incorporated:
Validation:
|
…t module - Add tests for _extract_value and _extract_metadata_value helpers - Add tests for SearchResultItem deduplication functionality - Add tests for mixed dict/SearchResultItem format handling - Add tests for source/filename fallback behavior from top-level attributes - Verify metadata preference over top-level attributes works correctly Ensures all reviewer feedback scenarios are properly tested. Co-authored-by: Mervin Praison <MervinPraison@users.noreply.github.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/praisonai-agents/tests/unit/rag/test_context_normalization.py (1)
128-148: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winAdd test coverage for mixed dict/object lists and consider integration test.
The new tests verify SearchResultItem object handling, but several scenarios remain untested:
- Mixed-format lists: The PR objectives state "Tested mixed-format scenarios," but no test verifies a list containing both dicts and SearchResultItem objects in the same call.
- SearchResultItem with metadata attribute: The implementation handles metadata via
_extract_value, but tests don't verify SearchResultItem objects that have ametadataattribute (dict or None).- Integration test gap: Per coding guidelines, "Real agentic tests are MANDATORY for every feature" with actual agent.start() calls. While these unit tests are appropriate for the utility functions, an integration test showing SearchResultItem objects flowing through an actual RAG pipeline (search → deduplicate → build_context → agent) would provide stronger validation of the fix for issue
#1732.🧪 Suggested additional test coverage
Add to the
TestSearchResultItemCompatibilityclass:+ def test_mixed_dict_and_object_formats(self): + """Test handling mixed dict and SearchResultItem in same list.""" + results = [ + {"text": "dict content", "metadata": {"source": "dict.txt"}}, + SearchResultItem(text="object content", source="object.pdf"), + ] + deduped = deduplicate_chunks(results) + assert len(deduped) == 2 + + context, used = build_context(results, include_source=True) + assert len(used) == 2 + assert "dict content" in context + assert "object content" in context + assert "dict.txt" in context + assert "object.pdf" in context + + def test_search_result_item_with_metadata_attribute(self): + """Test SearchResultItem with metadata dict attribute.""" + # Some SearchResultItem instances may have metadata attribute + item = SearchResultItem(text="content") + item.metadata = {"source": "meta.pdf"} + results = [item] + + context, used = build_context(results, include_source=True) + assert "meta.pdf" in context or "content" in contextAdditionally, consider adding an integration test in
tests/integration/rag/that verifies the full flow:# tests/integration/rag/test_rag_search_result_item.py def test_rag_pipeline_with_search_result_items(): """Integration test: RAG pipeline returns SearchResultItem and builds context.""" # This would test actual knowledge.search() → context builder flow # to ensure SearchResultItem objects work end-to-endAs per coding guidelines: "Real agentic tests are MANDATORY for every feature: Agent must call agent.start() with a real prompt, call the LLM, and produce actual text response—not just smoke tests of object construction."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/praisonai-agents/tests/unit/rag/test_context_normalization.py` around lines 128 - 148, The new unit tests exercise SearchResultItem objects but miss mixed-format lists, metadata handling, and an end-to-end integration test; add unit tests inside TestSearchResultItemCompatibility that call deduplicate_chunks and build_context with a single list mixing dict entries and SearchResultItem instances (e.g., [{"text":"x","source":"a"}, SearchResultItem(text="x", source="a")]) and add cases where SearchResultItem has a metadata attribute (both dict and None) to ensure _extract_value paths are exercised, and add an integration test that runs a real RAG pipeline invoking knowledge.search() → deduplicate_chunks → build_context → agent.start() to assert the agent produces actual text output.
🧹 Nitpick comments (1)
src/praisonai-agents/praisonaiagents/rag/context.py (1)
154-229: ⚡ Quick winConsider extracting repeated metadata normalization pattern.
The metadata normalization pattern appears identically in both
deduplicate_chunks(lines 104-106) andbuild_context(lines 201-203):metadata = _extract_value(result, "metadata") or {} if not isinstance(metadata, dict): metadata = {}Consider extracting this to a helper function to follow DRY principles:
def _extract_normalized_metadata(item: ResultItem) -> Dict[str, Any]: """Extract and normalize metadata to always return a dict.""" metadata = _extract_value(item, "metadata") or {} if not isinstance(metadata, dict): metadata = {} return metadataThen use:
metadata = _extract_normalized_metadata(result)This is a minor refactor that reduces duplication and makes the intent clearer.
♻️ Proposed refactor to eliminate duplication
Add the helper after
_extract_metadata_value:def _extract_metadata_value( item: ResultItem, metadata: Dict[str, Any], key: str, default: Any = "", ) -> Any: """ Extract metadata value with fallback to top-level item attribute. """ value = metadata.get(key) if value is None: return _extract_value(item, key, default) return value + + +def _extract_normalized_metadata(item: ResultItem) -> Dict[str, Any]: + """ + Extract and normalize metadata to always return a dict. + + Args: + item: Result item (dict or SearchResultItem) + + Returns: + Metadata dict (never None, never non-dict) + """ + metadata = _extract_value(item, "metadata") or {} + if not isinstance(metadata, dict): + metadata = {} + return metadataThen update
deduplicate_chunks:# Handle different result formats (dict or SearchResultItem) text = _extract_value(result, "text") or _extract_value(result, "memory", "") - # CRITICAL: Handle metadata=None from mem0 - ensure always dict - metadata = _extract_value(result, "metadata") or {} - if not isinstance(metadata, dict): - metadata = {} + metadata = _extract_normalized_metadata(result) source = _extract_metadata_value(result, metadata, "source", "")And update
build_context:# Build chunk text with optional source if include_source: - # CRITICAL: Handle metadata=None from mem0 - ensure always dict - metadata = _extract_value(result, "metadata") or {} - if not isinstance(metadata, dict): - metadata = {} + metadata = _extract_normalized_metadata(result) source = _extract_metadata_value(result, metadata, "source", "")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/praisonai-agents/praisonaiagents/rag/context.py` around lines 154 - 229, Duplicate metadata normalization occurs in deduplicate_chunks and build_context; add a small helper _extract_normalized_metadata(item: ResultItem) that calls _extract_value(item, "metadata") and ensures it returns a dict (fallback to {} if None or not a dict), then replace the duplicated lines in both deduplicate_chunks and build_context with metadata = _extract_normalized_metadata(result) and keep subsequent calls to _extract_metadata_value unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/praisonai-agents/tests/unit/rag/test_context_normalization.py`:
- Around line 128-148: The new unit tests exercise SearchResultItem objects but
miss mixed-format lists, metadata handling, and an end-to-end integration test;
add unit tests inside TestSearchResultItemCompatibility that call
deduplicate_chunks and build_context with a single list mixing dict entries and
SearchResultItem instances (e.g., [{"text":"x","source":"a"},
SearchResultItem(text="x", source="a")]) and add cases where SearchResultItem
has a metadata attribute (both dict and None) to ensure _extract_value paths are
exercised, and add an integration test that runs a real RAG pipeline invoking
knowledge.search() → deduplicate_chunks → build_context → agent.start() to
assert the agent produces actual text output.
---
Nitpick comments:
In `@src/praisonai-agents/praisonaiagents/rag/context.py`:
- Around line 154-229: Duplicate metadata normalization occurs in
deduplicate_chunks and build_context; add a small helper
_extract_normalized_metadata(item: ResultItem) that calls _extract_value(item,
"metadata") and ensures it returns a dict (fallback to {} if None or not a
dict), then replace the duplicated lines in both deduplicate_chunks and
build_context with metadata = _extract_normalized_metadata(result) and keep
subsequent calls to _extract_metadata_value unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 662d8e08-c661-4add-b871-40f8e9a48eb6
📒 Files selected for processing (2)
src/praisonai-agents/praisonaiagents/rag/context.pysrc/praisonai-agents/tests/unit/rag/test_context_normalization.py
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/praisonai-agents/tests/unit/rag/test_context.py (1)
228-411:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftAdd mandatory real agentic coverage for this feature.
Line 228 onward adds only unit/helper tests; none exercises
agent.start()with a real prompt and actual LLM text output. Please add at least one integration/e2e test for this feature path (while keeping these unit tests).As per coding guidelines,
Real agentic tests are MANDATORY for every feature: Agent must call agent.start() with a real prompt, call the LLM, and produce actual text response—not just smoke tests of object constructionandUse TDD (Test-Driven Development): Write failing test first, then implement feature, then verify test passes; structure tests into unit/, integration/, and e2e/ categories.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/praisonai-agents/tests/unit/rag/test_context.py` around lines 228 - 411, Add a real end-to-end test that exercises the agent runtime by calling agent.start() with a real prompt and asserting an actual LLM text response; specifically, create an integration/e2e test that instantiates the same Agent class used by this feature, calls Agent.start("Write a short summary of X") (or similar prompt), waits for the returned text/result, and asserts the response contains expected non-empty text (and/or a known phrase) to prove the LLM was invoked; keep the existing unit tests untouched, place the new test in an integration/e2e test module, and ensure it uses the real LLM configuration/environment (API key or test harness) so it fails when the feature is incomplete.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/praisonai-agents/tests/unit/rag/test_context.py`:
- Around line 228-411: Add a real end-to-end test that exercises the agent
runtime by calling agent.start() with a real prompt and asserting an actual LLM
text response; specifically, create an integration/e2e test that instantiates
the same Agent class used by this feature, calls Agent.start("Write a short
summary of X") (or similar prompt), waits for the returned text/result, and
asserts the response contains expected non-empty text (and/or a known phrase) to
prove the LLM was invoked; keep the existing unit tests untouched, place the new
test in an integration/e2e test module, and ensure it uses the real LLM
configuration/environment (API key or test harness) so it fails when the feature
is incomplete.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: fb2367fc-1640-4c49-8af9-336e6f4e7546
📒 Files selected for processing (1)
src/praisonai-agents/tests/unit/rag/test_context.py
Fixes #1732
This PR fixes the type mismatch issue where
deduplicate_chunksexpects dictionaries but receivesSearchResultItemobjects from the RAG pipeline.Changes
_extract_value()helper function to handle both dict and SearchResultItem objectsdeduplicate_chunks()andbuild_context()to use the new helperUnion[Dict[str, Any], Any]Testing
Generated with Claude Code
Summary by CodeRabbit