fix: make deduplicate_chunks and build_context handle SearchResultItem objects by MervinPraison · Pull Request #1739 · MervinPraison/PraisonAI

MervinPraison · 2026-05-25T08:16:06Z

This PR fixes the type mismatch issue where deduplicate_chunks expects dictionaries but receives SearchResultItem objects from the RAG pipeline.

Changes

Added _extract_value() helper function to handle both dict and SearchResultItem objects
Updated deduplicate_chunks() and build_context() to use the new helper
Updated type hints to accept Union[Dict[str, Any], Any]
Maintained full backward compatibility with existing dict-based usage

Testing

Verified SearchResultItem objects work correctly
Confirmed existing dictionary usage remains functional
Tested mixed format scenarios

Generated with Claude Code

Summary by CodeRabbit

Improvements
- RAG context processing now accepts both dict-style and object-like search results, normalizes metadata, extracts source/filename consistently, and generates stable chunk identifiers to improve context building and deduplication.
Tests
- Added unit tests confirming compatibility with object-like results, metadata precedence/fallbacks, deduplication behavior, and correct source inclusion in built contexts.

…m objects - Add _extract_value helper to handle both dict and SearchResultItem objects - Update type hints to accept Union[Dict[str, Any], Any] in context functions - Maintain full backward compatibility with existing dict-based usage - Fix AttributeError when SearchResultItem objects are passed to .get() method Fixes #1732 Co-authored-by: praisonai-triage-agent[bot] <praisonai-triage-agent[bot]@users.noreply.github.com>

qodo-code-review · 2026-05-25T08:16:11Z

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

coderabbitai · 2026-05-25T08:16:22Z

📝 Walkthrough

Walkthrough

The RAG context module now accepts both dict and object-like search results. New helpers _extract_value and _extract_metadata_value read fields uniformly. deduplicate_chunks, build_context, and DefaultContextBuilder.build were updated to handle mixed inputs and normalize metadata; tests added for SearchResultItem compatibility.

Changes

Mixed-format search results support

Layer / File(s)	Summary
Helper function and type support `src/praisonai-agents/praisonaiagents/rag/context.py`	`ResultItem` union type and `_extract_value`/`_extract_metadata_value` added to read from dicts or object-like items.
Deduplication with mixed-format support `src/praisonai-agents/praisonaiagents/rag/context.py`	`deduplicate_chunks` accepts `List[ResultItem]`, extracts `text`/`memory` via `_extract_value`, normalizes `metadata` to `{}` when missing, derives `source`, and computes stable hash-based deduplication.
Context building with mixed-format support `src/praisonai-agents/praisonaiagents/rag/context.py`	`build_context` signature and loop updated to handle mixed inputs; text/memory extracted via `_extract_value`; `include_source` reads `source`/`filename` using `_extract_metadata_value` after metadata normalization.
Builder class integration `src/praisonai-agents/praisonaiagents/rag/context.py`	`DefaultContextBuilder.build` accepts mixed `ResultItem` lists and its docstring documents support for dict and object-like results.
SearchResultItem compatibility tests `src/praisonai-agents/tests/unit/rag/test_context_normalization.py`, `src/praisonai-agents/tests/unit/rag/test_context.py`	Adds tests for `_extract_value`, `_extract_metadata_value`, deduplication with mixed formats, and `build_context` source/filename fallback behavior.

Sequence Diagram

sequenceDiagram
  participant Client
  participant Build as build_context
  participant Dedup as deduplicate_chunks
  participant Extract as _extract_value
  Client->>Build: pass List[ResultItem]
  Build->>Extract: extract text/memory for each item
  Extract-->>Build: text value
  Build->>Dedup: (optionally) pass extracted items for deduplication
  Dedup->>Extract: extract text/metadata/source as needed
  Dedup-->>Build: filtered/unique items (original objects)
  Build-->>Client: return composed context and used_results

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I hopped through dicts and objects, too,
Nibbling keys where getters grew,
I stitched the context, clean and neat,
Deduped the chunks with steady beat,
Now search results dance in two-by-two.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: making deduplicate_chunks and build_context handle SearchResultItem objects instead of just dicts.
Linked Issues check	✅ Passed	The PR successfully addresses issue `#1732` by adding _extract_value() and _extract_metadata_value() helpers to handle both dict and SearchResultItem inputs, updating function signatures to accept Union types, and adding comprehensive tests for both formats.
Out of Scope Changes check	✅ Passed	All changes are directly related to resolving the type mismatch issue in `#1732`; modifications to context.py, test files, and helper functions are within scope of fixing deduplicate_chunks and build_context compatibility.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch claude/issue-1732-20260523-1409

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

MervinPraison · 2026-05-25T08:16:30Z

@copilot Do a thorough review of this PR. Read ALL existing reviewer comments above from Qodo, Coderabbit, and Gemini first — incorporate their findings.

Review areas:

Bloat check: Are changes minimal and focused? Any unnecessary code or scope creep?
Security: Any hardcoded secrets, unsafe eval/exec, missing input validation?
Performance: Any module-level heavy imports? Hot-path regressions?
Tests: Are tests included? Do they cover the changes adequately?
Backward compat: Any public API changes without deprecation?
Code quality: DRY violations, naming conventions, error handling?
Address reviewer feedback: If Qodo, Coderabbit, or Gemini flagged valid issues, include them in your review
Suggest specific improvements with code examples where possible

gemini-code-assist

Code Review

This pull request updates the RAG context module to support both dictionary and object-based search results, such as SearchResultItem. Key changes include the introduction of a _extract_value helper function and updated type signatures across the deduplicate_chunks and build_context functions. Feedback focused on removing redundant isinstance checks for metadata extraction, as the variable is already validated as a dictionary earlier in the logic.

gemini-code-assist · 2026-05-25T08:16:57Z

+        if not isinstance(metadata, dict):
+            metadata = {}
+
+        source = metadata.get("source", "") if isinstance(metadata, dict) else ""


The conditional check if isinstance(metadata, dict) is redundant here because the logic on lines 83-85 already ensures that metadata is a dictionary. Simplifying this improves readability.

Suggested change

source = metadata.get("source", "") if isinstance(metadata, dict) else ""

source = metadata.get("source", "")

gemini-code-assist · 2026-05-25T08:16:58Z

+            source = metadata.get("source", "") if isinstance(metadata, dict) else ""
+            filename = metadata.get("filename", "") if isinstance(metadata, dict) else ""


The conditional checks if isinstance(metadata, dict) are redundant here because the logic on lines 180-182 already ensures that metadata is a dictionary. Simplifying these lines improves code clarity.

Suggested change

source = metadata.get("source", "") if isinstance(metadata, dict) else ""

filename = metadata.get("filename", "") if isinstance(metadata, dict) else ""

source = metadata.get("source", "")

filename = metadata.get("filename", "")

Copilot

Pull request overview

This PR addresses a runtime type mismatch in the RAG context builder by allowing deduplicate_chunks() and build_context() to consume both legacy dict-shaped retrieval results and object-shaped results (e.g., SearchResultItem) without raising AttributeError on .get() calls.

Changes:

Added _extract_value() helper to read fields from either dicts or objects via dict.get() / getattr().
Updated deduplicate_chunks() and build_context() to use _extract_value() and widened their type hints accordingly.
Updated DefaultContextBuilder.build() docstring/type hints to match the more flexible input shape.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

 def deduplicate_chunks(
-    results: List[Dict[str, Any]],
+    results: List[Union[Dict[str, Any], Any]],
    similarity_threshold: float = 0.9,
-) -> List[Dict[str, Any]]:
+) -> List[Union[Dict[str, Any], Any]]:


+        if not isinstance(metadata, dict):
+            metadata = {}
+
+        source = metadata.get("source", "") if isinstance(metadata, dict) else ""


+            source = metadata.get("source", "") if isinstance(metadata, dict) else ""
+            filename = metadata.get("filename", "") if isinstance(metadata, dict) else ""


+def _extract_value(item: Union[Dict[str, Any], Any], key: str, default: Any = None) -> Any:
+    """
+    Extract value from either dict or object (e.g., SearchResultItem).
+
+    Args:


MervinPraison · 2026-05-25T08:19:39Z

@claude You are the FINAL architecture reviewer. If the branch is under MervinPraison/PraisonAI (not a fork), you are able to make modifications to this branch and push directly. SCOPE: Focus ONLY on Python packages (praisonaiagents, praisonai). Do NOT modify praisonai-rust or praisonai-ts. Read ALL comments above from Gemini, Qodo, CodeRabbit, and Copilot carefully before responding.

Phase 1: Review per AGENTS.md

Protocol-driven: check heavy implementations vs core SDK
Backward compatible: ensure zero feature regressions
Performance: no hot-path regressions

Phase 2: FIX Valid Issues
4. For any VALID bugs or architectural flaws found by Gemini, CodeRabbit, Qodo, Copilot, or any other reviewer: implement the fix
5. Push all code fixes directly to THIS branch (do NOT create a new PR)
6. Comment a summary of exact files modified and what you skipped

Phase 3: Final Verdict
7. If all issues are resolved, approve the PR / close the Issue
8. If blocking issues remain, request changes / leave clear action items

greptile-apps · 2026-05-25T08:19:42Z

Greptile Summary

This PR fixes a type mismatch where deduplicate_chunks and build_context received SearchResultItem objects but only handled plain dicts, by introducing two helper functions (_extract_value, _extract_metadata_value) that dispatch between attribute access and dict-key access at each call site.

_extract_value wraps dict.get / getattr so both dicts and dataclass objects are supported transparently.
_extract_metadata_value adds a top-level attribute fallback when the metadata dict doesn't contain the requested key, correcting the previously broken source/filename label and dedup-hash logic for SearchResultItem inputs.
New tests cover helper functions in isolation and end-to-end compatibility with mixed dict / SearchResultItem input lists.

Confidence Score: 5/5

Safe to merge; the change is additive and backward-compatible, and all existing dict-based callers are unaffected.

The two new helper functions correctly handle the common cases (absent key, None value, real top-level attribute), and the downstream fallback chain gracefully absorbs the one remaining edge case where _extract_metadata_value returns an empty string instead of falling back to the top-level attribute. No data loss or security concern is introduced.

context.py — specifically the _extract_metadata_value guard logic around line 55–58

Important Files Changed

Filename	Overview
src/praisonai-agents/praisonaiagents/rag/context.py	Adds _extract_value and _extract_metadata_value helpers so deduplicate_chunks and build_context work with SearchResultItem objects; the fallback chain uses an `is None` guard that silently skips empty-string values in metadata
src/praisonai-agents/tests/unit/rag/test_context.py	Extends existing test suite with coverage for _extract_value, _extract_metadata_value, and SearchResultItem compatibility in both deduplicate_chunks and build_context
src/praisonai-agents/tests/unit/rag/test_context_normalization.py	Adds TestSearchResultItemCompatibility class with two focused regression tests for source-attribute fallback in deduplication and filename-attribute fallback in context building

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Search results\nList of dicts or SearchResultItem] --> B{deduplicate?}
    B -- yes --> C[deduplicate_chunks]
    B -- no --> E[build_context]
    C --> D1[_extract_value: text]
    C --> D2[_extract_value: metadata]
    D2 --> D3[ensure metadata is dict]
    D3 --> D5[_extract_metadata_value: source\nfallback to top-level attribute]
    D1 --> D6[_chunk_hash: text + source]
    D5 --> D6
    D6 --> D7{seen hash?}
    D7 -- yes --> D8[skip]
    D7 -- no --> D9[keep result]
    D9 --> E
    E --> F1[_extract_value: text]
    E --> F2[_extract_metadata_value: source and filename\nfallback to top-level]
    F1 --> F3[source_label = filename or source or Source N]
    F2 --> F3
    F3 --> G[Token budget check]
    G --> H[context string + used_results]

_{Reviews (3): Last reviewed commit: "test: add comprehensive SearchResultItem..." | Re-trigger Greptile}

praisonai-triage-agent · 2026-05-25T08:19:56Z

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

src/praisonai-agents/praisonaiagents/rag/context.py (2)
180-185: ⚡ Quick win

Remove redundant isinstance(metadata, dict) checks.

Similar to deduplicate_chunks, lines 180-183 normalize metadata to an empty dict, making it guaranteed to be a dict. The subsequent checks on lines 181-182 and 184-185 are redundant:

Lines 181-182 check isinstance(metadata, dict) after potentially setting it to {}.

Lines 184-185 check isinstance(metadata, dict) even though lines 180-183 already ensure this.
♻️ Simplify the metadata extraction logic
 # CRITICAL: Handle metadata=None from mem0 - ensure always dict
 metadata = _extract_value(result, "metadata") or {}
 if not isinstance(metadata, dict):
     metadata = {}
 
-source = metadata.get("source", "") if isinstance(metadata, dict) else ""
-filename = metadata.get("filename", "") if isinstance(metadata, dict) else ""
+source = metadata.get("source", "")
+filename = metadata.get("filename", "")
 source_label = filename or source or f"Source {i + 1}"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/praisonai-agents/praisonaiagents/rag/context.py` around lines 180 - 185,
The metadata normalization already ensures metadata is a dict (metadata =
_extract_value(result, "metadata") or {}; if not isinstance(metadata, dict):
metadata = {}), so remove the redundant isinstance(metadata, dict) guards when
extracting source and filename; directly use metadata.get("source", "") and
metadata.get("filename", "") in the same block (referencing the metadata
variable and the _extract_value call), mirroring the simplification used in
deduplicate_chunks.
83-87: ⚡ Quick win

Remove redundant isinstance(metadata, dict) checks.

Lines 83-86 normalize metadata to an empty dict if it's None or not a dict. After this normalization, metadata is guaranteed to be a dict. Therefore:

Lines 84-85 redundantly check isinstance(metadata, dict) immediately after setting it to {} on line 85.

Line 87 checks isinstance(metadata, dict) even though lines 83-86 already ensure this.

According to the context snippet from knowledge/models.py, SearchResultItem.metadata is enforced to always be a dict (never None) in __post_init__. The defensive checks are only needed for legacy dict-format results.
♻️ Simplify the metadata extraction logic
 # CRITICAL: Handle metadata=None from mem0 - ensure always dict
 metadata = _extract_value(result, "metadata") or {}
 if not isinstance(metadata, dict):
     metadata = {}
 
-source = metadata.get("source", "") if isinstance(metadata, dict) else ""
+source = metadata.get("source", "")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/praisonai-agents/praisonaiagents/rag/context.py` around lines 83 - 87,
The metadata normalization currently does redundant isinstance checks: after
setting metadata = _extract_value(result, "metadata") or {} the value is
guaranteed to be a dict, so remove the extra isinstance(metadata, dict) branches
and simplify to: keep metadata = _extract_value(result, "metadata") or {} and
then compute source = metadata.get("source", "") directly; update any similar
redundant checks in the surrounding code (references: variable metadata,
function _extract_value, and the source assignment) to rely on the normalized
dict.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/praisonai-agents/praisonaiagents/rag/context.py`:
- Around line 56-58: Tighten the overly-broad type hint Union[Dict[str, Any],
Any] by introducing a protocol (e.g., SearchResultProtocol) that declares the
expected attributes/methods (text, metadata: Dict[str, Any], score: float) and
replace the parameter/return types to use List[SearchResultProtocol] in the
function signature (the parameters named results and similarity_threshold and
the function return). Define SearchResultProtocol in this module (or a shared
typing module) so plain dicts that match the protocol are accepted via
structural typing and update imports/annotations accordingly.
- Around line 22-38: The helper _extract_value currently swallows missing
fields; change it to fail fast by validating inputs and raising a clear
exception when a required field is absent (e.g., raise ValueError with a message
that includes which key is missing and contextual hints like agent name/tool
name/session id if available). Replace the loose Union[Dict[str, Any], Any]
signature by defining a SearchResultProtocol (typing.Protocol) that declares
expected attributes such as text, metadata, score, and accept that protocol type
in function signatures (use isinstance checks or hasattr where necessary), and
update callers to pass objects conforming to SearchResultProtocol; also add an
optional on_error hook parameter or callsite to allow interception of the raised
error before propagation. Ensure error messages include remediation advice and
the unique identifier of the offending item to aid debugging.

---

Nitpick comments:
In `@src/praisonai-agents/praisonaiagents/rag/context.py`:
- Around line 180-185: The metadata normalization already ensures metadata is a
dict (metadata = _extract_value(result, "metadata") or {}; if not
isinstance(metadata, dict): metadata = {}), so remove the redundant
isinstance(metadata, dict) guards when extracting source and filename; directly
use metadata.get("source", "") and metadata.get("filename", "") in the same
block (referencing the metadata variable and the _extract_value call), mirroring
the simplification used in deduplicate_chunks.
- Around line 83-87: The metadata normalization currently does redundant
isinstance checks: after setting metadata = _extract_value(result, "metadata")
or {} the value is guaranteed to be a dict, so remove the extra
isinstance(metadata, dict) branches and simplify to: keep metadata =
_extract_value(result, "metadata") or {} and then compute source =
metadata.get("source", "") directly; update any similar redundant checks in the
surrounding code (references: variable metadata, function _extract_value, and
the source assignment) to rely on the normalized dict.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c38fa401-119d-4512-89a9-cadeb4a116d0

📥 Commits

Reviewing files that changed from the base of the PR and between 8acf77c and 2f8558e.

📒 Files selected for processing (1)

src/praisonai-agents/praisonaiagents/rag/context.py

coderabbitai · 2026-05-25T08:20:31Z

+def _extract_value(item: Union[Dict[str, Any], Any], key: str, default: Any = None) -> Any:
+    """
+    Extract value from either dict or object (e.g., SearchResultItem).
+
+    Args:
+        item: Dictionary or object with attributes
+        key: Key/attribute name to extract
+        default: Default value if key not found
+
+    Returns:
+        Extracted value or default
+    """
+    if isinstance(item, dict):
+        return item.get(key, default)
+    else:
+        return getattr(item, key, default)
+


🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Consider fail-fast error handling for invalid result items.

The _extract_value helper silently returns defaults when keys or attributes are missing. Per coding guidelines, core SDK should fail fast with clear error messages. If a result item lacks required fields like text or metadata, this may indicate a contract violation that should be surfaced rather than silently handled.

Additionally, the coding guidelines specify protocol-driven design for extension points. Instead of accepting Union[Dict[str, Any], Any] (which is effectively just Any), consider defining a SearchResultProtocol using typing.Protocol that specifies the expected interface (e.g., text, metadata, score attributes). This would provide better type safety and clearer contracts.

As per coding guidelines: "Core SDK (praisonaiagents) must use protocol-driven design with typing.Protocol for all extension points, not heavy implementations" and "Error handling: Fail fast with clear error messages; include remediation hints in exceptions; propagate context (agent name, tool name, session ID); provide hook points for error interception (on_error events)".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/praisonai-agents/praisonaiagents/rag/context.py` around lines 22 - 38, The helper _extract_value currently swallows missing fields; change it to fail fast by validating inputs and raising a clear exception when a required field is absent (e.g., raise ValueError with a message that includes which key is missing and contextual hints like agent name/tool name/session id if available). Replace the loose Union[Dict[str, Any], Any] signature by defining a SearchResultProtocol (typing.Protocol) that declares expected attributes such as text, metadata, score, and accept that protocol type in function signatures (use isinstance checks or hasattr where necessary), and update callers to pass objects conforming to SearchResultProtocol; also add an optional on_error hook parameter or callsite to allow interception of the raised error before propagation. Ensure error messages include remediation advice and the unique identifier of the offending item to aid debugging.

Agent-Logs-Url: https://github.com/MervinPraison/PraisonAI/sessions/fbaed72c-c0f6-4426-8cb4-09c3542f499f Co-authored-by: MervinPraison <454862+MervinPraison@users.noreply.github.com>

Copilot · 2026-05-25T08:23:04Z

@copilot Do a thorough review of this PR. Read ALL existing reviewer comments above from Qodo, Coderabbit, and Gemini first — incorporate their findings.

Review areas:...

Completed a full review and incorporated all actionable bot feedback.

Addressed in commits 4c3a910 and 0667726:

Reduced bloat and improved clarity by removing redundant metadata checks and centralizing fallback extraction.
Tightened typing (ResultItem alias with TYPE_CHECKING import) to avoid Union[..., Any] collapsing to Any.
Fixed compatibility gaps: now fallback to top-level source/filename attributes for SearchResultItem in both deduplication and context labels.
Added unit tests using real SearchResultItem objects to cover the new object-input path.

Security/performance checks: no secrets, eval/exec, or heavy imports introduced; hot path remains lightweight.
Backward compatibility: dict-based behavior preserved; no public API break.

Reviewer findings incorporated:

Qodo: paused (no actionable findings).
Coderabbit: no actionable issues posted on this PR.
Gemini: redundant-check feedback applied.

Validation:

Local tests: python -m pytest tests/unit/rag/test_context.py tests/unit/rag/test_context_normalization.py (33 passed).
Parallel validation: Code Review clean; CodeQL reported no alerts.

…t module - Add tests for _extract_value and _extract_metadata_value helpers - Add tests for SearchResultItem deduplication functionality - Add tests for mixed dict/SearchResultItem format handling - Add tests for source/filename fallback behavior from top-level attributes - Verify metadata preference over top-level attributes works correctly Ensures all reviewer feedback scenarios are properly tested. Co-authored-by: Mervin Praison <MervinPraison@users.noreply.github.com>

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/praisonai-agents/tests/unit/rag/test_context_normalization.py (1)
128-148: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Add test coverage for mixed dict/object lists and consider integration test.

The new tests verify SearchResultItem object handling, but several scenarios remain untested:

Mixed-format lists: The PR objectives state "Tested mixed-format scenarios," but no test verifies a list containing both dicts and SearchResultItem objects in the same call.

SearchResultItem with metadata attribute: The implementation handles metadata via _extract_value, but tests don't verify SearchResultItem objects that have a metadata attribute (dict or None).

Integration test gap: Per coding guidelines, "Real agentic tests are MANDATORY for every feature" with actual agent.start() calls. While these unit tests are appropriate for the utility functions, an integration test showing SearchResultItem objects flowing through an actual RAG pipeline (search → deduplicate → build_context → agent) would provide stronger validation of the fix for issue #1732.
🧪 Suggested additional test coverage

Add to the TestSearchResultItemCompatibility class:
+    def test_mixed_dict_and_object_formats(self):
+        """Test handling mixed dict and SearchResultItem in same list."""
+        results = [
+            {"text": "dict content", "metadata": {"source": "dict.txt"}},
+            SearchResultItem(text="object content", source="object.pdf"),
+        ]
+        deduped = deduplicate_chunks(results)
+        assert len(deduped) == 2
+        
+        context, used = build_context(results, include_source=True)
+        assert len(used) == 2
+        assert "dict content" in context
+        assert "object content" in context
+        assert "dict.txt" in context
+        assert "object.pdf" in context
+
+    def test_search_result_item_with_metadata_attribute(self):
+        """Test SearchResultItem with metadata dict attribute."""
+        # Some SearchResultItem instances may have metadata attribute
+        item = SearchResultItem(text="content")
+        item.metadata = {"source": "meta.pdf"}
+        results = [item]
+        
+        context, used = build_context(results, include_source=True)
+        assert "meta.pdf" in context or "content" in context
Additionally, consider adding an integration test in tests/integration/rag/ that verifies the full flow:
# tests/integration/rag/test_rag_search_result_item.py
def test_rag_pipeline_with_search_result_items():
    """Integration test: RAG pipeline returns SearchResultItem and builds context."""
    # This would test actual knowledge.search() → context builder flow
    # to ensure SearchResultItem objects work end-to-end
As per coding guidelines: "Real agentic tests are MANDATORY for every feature: Agent must call agent.start() with a real prompt, call the LLM, and produce actual text response—not just smoke tests of object construction."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/praisonai-agents/tests/unit/rag/test_context_normalization.py` around
lines 128 - 148, The new unit tests exercise SearchResultItem objects but miss
mixed-format lists, metadata handling, and an end-to-end integration test; add
unit tests inside TestSearchResultItemCompatibility that call deduplicate_chunks
and build_context with a single list mixing dict entries and SearchResultItem
instances (e.g., [{"text":"x","source":"a"}, SearchResultItem(text="x",
source="a")]) and add cases where SearchResultItem has a metadata attribute
(both dict and None) to ensure _extract_value paths are exercised, and add an
integration test that runs a real RAG pipeline invoking knowledge.search() →
deduplicate_chunks → build_context → agent.start() to assert the agent produces
actual text output.

🧹 Nitpick comments (1)

src/praisonai-agents/praisonaiagents/rag/context.py (1)

154-229: ⚡ Quick win

Consider extracting repeated metadata normalization pattern.

The metadata normalization pattern appears identically in both deduplicate_chunks (lines 104-106) and build_context (lines 201-203):

metadata = _extract_value(result, "metadata") or {}
if not isinstance(metadata, dict):
    metadata = {}

Consider extracting this to a helper function to follow DRY principles:

def _extract_normalized_metadata(item: ResultItem) -> Dict[str, Any]:
    """Extract and normalize metadata to always return a dict."""
    metadata = _extract_value(item, "metadata") or {}
    if not isinstance(metadata, dict):
        metadata = {}
    return metadata

Then use: metadata = _extract_normalized_metadata(result)

This is a minor refactor that reduces duplication and makes the intent clearer.

♻️ Proposed refactor to eliminate duplication

Add the helper after _extract_metadata_value:

 def _extract_metadata_value(
     item: ResultItem,
     metadata: Dict[str, Any],
     key: str,
     default: Any = "",
 ) -> Any:
     """
     Extract metadata value with fallback to top-level item attribute.
     """
     value = metadata.get(key)
     if value is None:
         return _extract_value(item, key, default)
     return value
+
+
+def _extract_normalized_metadata(item: ResultItem) -> Dict[str, Any]:
+    """
+    Extract and normalize metadata to always return a dict.
+    
+    Args:
+        item: Result item (dict or SearchResultItem)
+        
+    Returns:
+        Metadata dict (never None, never non-dict)
+    """
+    metadata = _extract_value(item, "metadata") or {}
+    if not isinstance(metadata, dict):
+        metadata = {}
+    return metadata

Then update deduplicate_chunks:

         # Handle different result formats (dict or SearchResultItem)
         text = _extract_value(result, "text") or _extract_value(result, "memory", "")
         
-        # CRITICAL: Handle metadata=None from mem0 - ensure always dict
-        metadata = _extract_value(result, "metadata") or {}
-        if not isinstance(metadata, dict):
-            metadata = {}
+        metadata = _extract_normalized_metadata(result)
         
         source = _extract_metadata_value(result, metadata, "source", "")

And update build_context:

         # Build chunk text with optional source
         if include_source:
-            # CRITICAL: Handle metadata=None from mem0 - ensure always dict
-            metadata = _extract_value(result, "metadata") or {}
-            if not isinstance(metadata, dict):
-                metadata = {}
+            metadata = _extract_normalized_metadata(result)
             
             source = _extract_metadata_value(result, metadata, "source", "")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/praisonai-agents/praisonaiagents/rag/context.py` around lines 154 - 229,
Duplicate metadata normalization occurs in deduplicate_chunks and build_context;
add a small helper _extract_normalized_metadata(item: ResultItem) that calls
_extract_value(item, "metadata") and ensures it returns a dict (fallback to {}
if None or not a dict), then replace the duplicated lines in both
deduplicate_chunks and build_context with metadata =
_extract_normalized_metadata(result) and keep subsequent calls to
_extract_metadata_value unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/praisonai-agents/tests/unit/rag/test_context_normalization.py`:
- Around line 128-148: The new unit tests exercise SearchResultItem objects but
miss mixed-format lists, metadata handling, and an end-to-end integration test;
add unit tests inside TestSearchResultItemCompatibility that call
deduplicate_chunks and build_context with a single list mixing dict entries and
SearchResultItem instances (e.g., [{"text":"x","source":"a"},
SearchResultItem(text="x", source="a")]) and add cases where SearchResultItem
has a metadata attribute (both dict and None) to ensure _extract_value paths are
exercised, and add an integration test that runs a real RAG pipeline invoking
knowledge.search() → deduplicate_chunks → build_context → agent.start() to
assert the agent produces actual text output.

---

Nitpick comments:
In `@src/praisonai-agents/praisonaiagents/rag/context.py`:
- Around line 154-229: Duplicate metadata normalization occurs in
deduplicate_chunks and build_context; add a small helper
_extract_normalized_metadata(item: ResultItem) that calls _extract_value(item,
"metadata") and ensures it returns a dict (fallback to {} if None or not a
dict), then replace the duplicated lines in both deduplicate_chunks and
build_context with metadata = _extract_normalized_metadata(result) and keep
subsequent calls to _extract_metadata_value unchanged.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 662d8e08-c661-4add-b871-40f8e9a48eb6

📥 Commits

Reviewing files that changed from the base of the PR and between 2f8558e and 0667726.

📒 Files selected for processing (2)

src/praisonai-agents/praisonaiagents/rag/context.py
src/praisonai-agents/tests/unit/rag/test_context_normalization.py

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/praisonai-agents/tests/unit/rag/test_context.py (1)
228-411: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Add mandatory real agentic coverage for this feature.

Line 228 onward adds only unit/helper tests; none exercises agent.start() with a real prompt and actual LLM text output. Please add at least one integration/e2e test for this feature path (while keeping these unit tests).

As per coding guidelines, Real agentic tests are MANDATORY for every feature: Agent must call agent.start() with a real prompt, call the LLM, and produce actual text response—not just smoke tests of object construction and Use TDD (Test-Driven Development): Write failing test first, then implement feature, then verify test passes; structure tests into unit/, integration/, and e2e/ categories.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/praisonai-agents/tests/unit/rag/test_context.py` around lines 228 - 411,
Add a real end-to-end test that exercises the agent runtime by calling
agent.start() with a real prompt and asserting an actual LLM text response;
specifically, create an integration/e2e test that instantiates the same Agent
class used by this feature, calls Agent.start("Write a short summary of X") (or
similar prompt), waits for the returned text/result, and asserts the response
contains expected non-empty text (and/or a known phrase) to prove the LLM was
invoked; keep the existing unit tests untouched, place the new test in an
integration/e2e test module, and ensure it uses the real LLM
configuration/environment (API key or test harness) so it fails when the feature
is incomplete.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/praisonai-agents/tests/unit/rag/test_context.py`:
- Around line 228-411: Add a real end-to-end test that exercises the agent
runtime by calling agent.start() with a real prompt and asserting an actual LLM
text response; specifically, create an integration/e2e test that instantiates
the same Agent class used by this feature, calls Agent.start("Write a short
summary of X") (or similar prompt), waits for the returned text/result, and
asserts the response contains expected non-empty text (and/or a known phrase) to
prove the LLM was invoked; keep the existing unit tests untouched, place the new
test in an integration/e2e test module, and ensure it uses the real LLM
configuration/environment (API key or test harness) so it fails when the feature
is incomplete.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fb2367fc-1640-4c49-8af9-336e6f4e7546

📥 Commits

Reviewing files that changed from the base of the PR and between 0667726 and 984808d.

📒 Files selected for processing (1)

src/praisonai-agents/tests/unit/rag/test_context.py

Copilot AI review requested due to automatic review settings May 25, 2026 08:16

Copilot started reviewing on behalf of MervinPraison May 25, 2026 08:16 View session

gemini-code-assist Bot reviewed May 25, 2026

View reviewed changes

Copilot started work on behalf of MervinPraison May 25, 2026 08:18 View session

Copilot AI reviewed May 25, 2026

View reviewed changes

greptile-apps Bot reviewed May 25, 2026

View reviewed changes

Comment thread src/praisonai-agents/praisonaiagents/rag/context.py Outdated

Comment thread src/praisonai-agents/praisonaiagents/rag/context.py Outdated

Comment thread src/praisonai-agents/praisonaiagents/rag/context.py

Comment thread src/praisonai-agents/praisonaiagents/rag/context.py Outdated

coderabbitai Bot reviewed May 25, 2026

View reviewed changes

Copilot AI and others added 3 commits May 25, 2026 08:21

fix: improve SearchResultItem handling in rag context builder

4c3a910

Agent-Logs-Url: https://github.com/MervinPraison/PraisonAI/sessions/fbaed72c-c0f6-4426-8cb4-09c3542f499f Co-authored-by: MervinPraison <454862+MervinPraison@users.noreply.github.com>

refactor: tighten source fallback semantics in rag context

5906289

Agent-Logs-Url: https://github.com/MervinPraison/PraisonAI/sessions/fbaed72c-c0f6-4426-8cb4-09c3542f499f Co-authored-by: MervinPraison <454862+MervinPraison@users.noreply.github.com>

refactor: dedupe metadata fallback extraction in rag context

0667726

Agent-Logs-Url: https://github.com/MervinPraison/PraisonAI/sessions/fbaed72c-c0f6-4426-8cb4-09c3542f499f Co-authored-by: MervinPraison <454862+MervinPraison@users.noreply.github.com>

Copilot finished work on behalf of MervinPraison May 25, 2026 08:23

coderabbitai Bot reviewed May 25, 2026

View reviewed changes

	source = metadata.get("source", "") if isinstance(metadata, dict) else ""
	source = metadata.get("source", "")

		source = metadata.get("source", "") if isinstance(metadata, dict) else ""
		filename = metadata.get("filename", "") if isinstance(metadata, dict) else ""

Uh oh!

Conversation

MervinPraison commented May 25, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Testing

Summary by CodeRabbit

Uh oh!

qodo-code-review Bot commented May 25, 2026

Qodo reviews are paused for this user.

Uh oh!

coderabbitai Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

MervinPraison commented May 25, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

MervinPraison commented May 25, 2026

Uh oh!

greptile-apps Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

praisonai-triage-agent Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Final Architecture Review ✅ APPROVED

🎯 Final Architectural Assessment

✅ APPROVED - All Issues Resolved

🚀 Recommendation: MERGE

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI commented May 25, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MervinPraison commented May 25, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 25, 2026 •

edited

Loading

greptile-apps Bot commented May 25, 2026 •

edited

Loading

praisonai-triage-agent Bot commented May 25, 2026 •

edited

Loading