Skip to content

Migrate graph backend to LadybugDB#4

Open
pi-dal wants to merge 2 commits into
nowledge-co:mainfrom
pi-dal:ladybug-migration
Open

Migrate graph backend to LadybugDB#4
pi-dal wants to merge 2 commits into
nowledge-co:mainfrom
pi-dal:ladybug-migration

Conversation

@pi-dal
Copy link
Copy Markdown

@pi-dal pi-dal commented Jun 1, 2026

Why this change is needed

KuzuDB has been archived, which makes it a risky long-term backend for OpenKL's local-first knowledge graph. OpenKL depends on the embedded graph engine not only for basic Cypher storage, but also for vector index creation and vector search. In practice, the old Kuzu package also keeps a brittle dependency on the Kuzu extension distribution path for INSTALL VECTOR / LOAD VECTOR, so fresh installs can fail before users ever get to the useful OpenKL workflows.

LadybugDB is the active successor to KuzuDB and keeps the API shape OpenKL already relies on: embedded database construction, connection execution, Cypher support, and native vector index procedures. Moving now keeps the project aligned with the maintained upstream while preserving the current OpenKL mental model: files remain canonical, and the graph stays a derived retrieval/index layer.

The migration is intentionally conservative. We do not silently open an existing ~/.ok/kuzu directory with the new backend, because binary/file-format compatibility is not a safe assumption. Instead, new graph state is created under ~/.ok/ladybug, while legacy Kuzu graph data is left untouched and reported as a legacy derived index. That gives users a safer path forward and avoids destructive or ambiguous in-place behavior : )

What changed

Runtime dependency and lockfile

  • Replaced the runtime graph dependency from kuzu>=0.8.0 to ladybug>=0.17.0.
  • Refreshed uv.lock so fresh installs resolve ladybug 0.17.0 instead of the archived Kuzu package.
  • Added pytest and mypy to the active dev dependency group so the new migration smoke tests and scoped type checks are installable through the repository's current uv workflow.

Graph backend initialization

  • Updated openkl.db to import ladybug and expose it through a local graphdb alias.
  • Changed the default graph database path from ~/.ok/kuzu to ~/.ok/ladybug.
  • Added a LEGACY_KUZU_DB_PATH check. If OpenKL sees an old ~/.ok/kuzu directory and no new Ladybug store exists yet, it logs a warning instead of silently reusing the old path.
  • Kept the existing schema and vector extension initialization path intact, because the current OpenKL data model and vector search flow still map cleanly onto LadybugDB.

CLI and user-facing naming

  • Updated ok doctor so it checks for ladybug and reports LadybugDB available.
  • Renamed runtime docstrings and helper wording away from Kuzu where the code now uses LadybugDB.
  • Kept explicit Kuzu references only where they describe legacy migration behavior.

Graph and vector-search compatibility cleanup

  • Updated graph/vector modules to use LadybugDB naming.
  • Renamed _process_kuzu_result to _process_graph_result so result handling is backend-neutral.
  • Added a small typed boundary around the migrated db, graph, and vector_search modules. This keeps Ladybug's dynamic query-result typing from leaking as broadly through the migration path.
  • Preserved the current CREATE_VECTOR_INDEX, QUERY_VECTOR_INDEX, and SHOW_INDEXES flow, which was verified against LadybugDB in smoke tests.

Migration documentation

  • Updated the README architecture section to say OpenKL uses LadybugDB.
  • Added a short README migration note explaining the ~/.ok/kuzu to ~/.ok/ladybug behavior.
  • Updated the design RFC path and engine reference from KuzuDB to LadybugDB.

Tests

  • Added tests/test_db_backend.py to verify Ladybug-backed schema initialization and the legacy Kuzu-path warning behavior.
  • Added tests/test_vector_search.py to verify a full memory vector-index round trip through LadybugDB.

Data compatibility behavior

This PR intentionally treats the graph database as a derived index, consistent with the existing design docs. Existing user-authored content and store files remain the durable source of truth. Existing ~/.ok/kuzu graph state is not deleted, modified, or opened by LadybugDB automatically.

The safe behavior after this PR is:

  • New OpenKL graph state goes to ~/.ok/ladybug.
  • Existing ~/.ok/kuzu state is left in place as a legacy derived index.
  • Users can keep that old directory as a backup until a future explicit graph rebuild/migration command exists.

Verification

Fresh checks run before opening this PR:

uv run pytest -q
uv run ruff check . && uv run ruff format --check .
uv run mypy openkl/db.py openkl/graph.py openkl/vector_search.py

Results:

  • pytest: 3 passed
  • ruff check: all checks passed
  • ruff format --check: 14 files already formatted
  • scoped mypy on migrated backend modules: no issues found

I also ran an isolated HOME CLI smoke test so the verification would not touch a real user ~/.ok directory. The smoke covered:

  • ok doctor, confirming LadybugDB is available and the database initializes
  • ok mem add
  • ok mem search --json
  • ok graph cypher over the MemoryNote -> HasTopic -> Topic relationship
  • ok graph list-indexes, confirming both memory_vec_idx and chunk_vec_idx are created through LadybugDB

Representative output from the final smoke:

doctor=2:✓ LadybugDB available
6:✓ Database initialized
search=m-20260601-4ce84d6e memory
graph=migration
indexes=3:  Index: chunk_vec_idx
9:  Index: memory_vec_idx

Known follow-up

The full repository-wide uv run mypy openkl still reports pre-existing typing debt outside the migrated backend path, mostly in CLI, parsers, citations, memory, and store modules. This PR keeps the migration focused and verifies the changed backend-critical modules with scoped mypy instead of turning the graph backend migration into a full type-hardening project.

Summary by CodeRabbit

  • Documentation

    • Updated references throughout to reflect LadybugDB as the embedded graph database
    • Added migration guidance for users with existing data
  • Refactor

    • Migrated database backend to LadybugDB with updated storage location (~/.ok/ladybug)
    • Enhanced legacy data detection and migration support
  • Tests

    • Added database initialization tests with schema verification
    • Added vector search functionality tests
  • Chores

    • Updated project dependencies (LadybugDB ≥0.17.0)

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 1, 2026

Review Change Stack

Warning

Review limit reached

@pi-dal, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 9 minutes and 36 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a4bf2b64-b861-4509-b47a-261d3d2254c3

📥 Commits

Reviewing files that changed from the base of the PR and between da193ac and 00b499f.

📒 Files selected for processing (6)
  • openkl/db.py
  • openkl/memory.py
  • openkl/vector_search.py
  • tests/test_db_backend.py
  • tests/test_memory_manager.py
  • tests/test_vector_search.py
📝 Walkthrough

Walkthrough

This PR migrates OpenKL's graph database backend from the archived Kuzu DB to LadybugDB. The database layer, query processing, vector operations, and type annotations are updated throughout the codebase. Legacy Kuzu data detection and migration guidance are introduced, and tests verify the new initialization and vector search workflows.

Changes

Kuzu to LadybugDB Migration

Layer / File(s) Summary
Dependency declaration and CLI updates
pyproject.toml, openkl/cli.py
Pyproject.toml replaces kuzu>=0.8.0 with ladybug>=0.17.0 and adds mypy>=1.0 and pytest>=7.0 to dev dependencies. CLI doctor command switches dependency check from kuzu to ladybug.
Database backend rewrite and initialization
openkl/db.py
Core database module imports LadybugDB as graphdb, updates DB_PATH to ~/.ok/ladybug, adds LEGACY_KUZU_DB_PATH constant for legacy detection, and reimplements init_db(), get_connection(), and close_connection() for LadybugDB with warning logs when legacy data exists.
Database initialization and legacy detection tests
tests/test_db_backend.py
Two new tests verify that init_db() creates the core MemoryNote schema and properly closes connections. A second test confirms legacy Kuzu database detection emits appropriate warnings and switches the database to the new Ladybug path.
Graph query result processing
openkl/graph.py
Result processor renamed from _process_kuzu_result to _process_graph_result with cast-based row handling compatible with LadybugDB. GraphManager.__init__, run_cypher, and print_results gain explicit return type annotations. get_entity_stats parses node labels from casted nested row values.
Vector search type safety and testing
openkl/vector_search.py, tests/test_vector_search.py
Vector search functions adopt cast-based row unpacking and force distance to Python float before similarity computation. search_memory_vectors and search_chunk_vectors accept query_vector: Any for LadybugDB compatibility. Vector operations gain explicit -> None return types. New integration test verifies vector index creation and round-trip search.
Memory update escaping and type annotation cleanup
openkl/memory.py, openkl/distill.py
MemoryManager.update() precomputes escaped_text by replacing single quotes before Cypher interpolation. DistillationManager.create_memory_from_distillation() removes Optional import and modernizes tags and topics parameters to list[str] | None union syntax.
Architecture and user documentation updates
README.md, rfcs/0000-openkl-design.md
README feature list and architecture section switch from Kùzu DB to LadybugDB, with new migration guidance for users with legacy data in ~/.ok/kuzu. RFC updates embedded database directory reference from kuzu/ to ladybug/ and engine description.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 From Kuzu's archived pages we migrate,
To LadybugDB, fresh and first-rate!
With vectors cast and queries refined,
Legacy data handled with care in mind,
A database dance, both graceful and bright! 🪲✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Migrate graph backend to LadybugDB' directly and clearly summarizes the primary change across the entire changeset: replacing KuzuDB with LadybugDB throughout the codebase, including database layer, CLI, configuration, documentation, and tests.
Docstring Coverage ✅ Passed Docstring coverage is 82.61% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@pi-dal pi-dal force-pushed the ladybug-migration branch from bcdcfda to a596346 Compare June 1, 2026 02:14
KuzuDB has been archived, and its extension server dependency makes OpenKL's graph and vector-search path brittle for fresh installs. LadybugDB is the active successor and preserves the embedded Cypher/vector-index API shape that OpenKL relies on, so this migrates the runtime backend while keeping the current CLI and graph schema behavior intact.

Replace the runtime dependency and lockfile entry from kuzu to ladybug, update database initialization to create new graph stores under ~/.ok/ladybug, and add a legacy ~/.ok/kuzu warning instead of attempting implicit in-place reuse of old derived graph state. This keeps old Kuzu data untouched and makes the migration explicit for users.

Update the doctor command and graph/vector-search modules for LadybugDB naming, tighten the type boundary around the migrated db/graph/vector modules, and keep the existing CREATE_VECTOR_INDEX / QUERY_VECTOR_INDEX path covered. The migration also includes smoke tests for schema creation, vector search, and legacy-path warning behavior.

Refresh README and RFC references so the documented architecture points to LadybugDB and the new ~/.ok/ladybug location.

Verification run before commit:

- uv run pytest -q

- uv run ruff check . && uv run ruff format --check .

- uv run mypy openkl/db.py openkl/graph.py openkl/vector_search.py

- isolated HOME CLI smoke covering ok doctor, mem add/search, graph cypher, and graph list-indexes
@pi-dal pi-dal force-pushed the ladybug-migration branch from a596346 to da193ac Compare June 1, 2026 02:15
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
openkl/memory.py (1)

185-196: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use query parameters instead of interpolating escaped text.

At Line 186-187, manual escaping still leaves the update path vulnerable to Cypher injection and edge-case breakage. Build the SET clause with parameters and pass values via conn.execute(..., params).

Suggested fix
         # Build update query
         updates = []
+        params: dict[str, Any] = {"memory_id": memory_id}
         if text is not None:
-            escaped_text = text.replace("'", "\\'")
-            updates.append(f"m.text = '{escaped_text}'")
+            updates.append("m.text = $text")
+            params["text"] = text
         if tags is not None:
-            tags_str = "[" + ", ".join([f"'{tag}'" for tag in tags]) + "]"
-            updates.append(f"m.tags = {tags_str}")
+            updates.append("m.tags = $tags")
+            params["tags"] = tags
 
         if updates:
-            update_query = (
-                f"MATCH (m:MemoryNote {{id: '{memory_id}'}}) SET {', '.join(updates)}"
-            )
-            conn.execute(update_query)
+            update_query = f"MATCH (m:MemoryNote {{id: $memory_id}}) SET {', '.join(updates)}"
+            conn.execute(update_query, params)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@openkl/memory.py` around lines 185 - 196, The current update builds a Cypher
string by interpolating escaped values (variables: memory_id, escaped_text,
tags, updates, update_query) and calls conn.execute, which is vulnerable;
instead construct the SET clause using parameter placeholders (e.g., m.text =
$text_param, m.tags = $tags_param, m.id = $id_param) and collect a params dict
with keys like text_param, tags_param, id_param (or similar), convert tags to a
list value rather than stringifying, then call conn.execute(update_query,
params=params) so all values are bound as parameters rather than interpolated.
openkl/vector_search.py (1)

145-167: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate and sanitize query_vector before Cypher interpolation.

query_vector: Any combined with vector_str = str(query_vector) allows malformed or injected Cypher when non-numeric/untrusted input reaches these paths. Normalize to a numeric list and serialize from validated floats only (same for k bounds).

Suggested fix
+def _normalize_query_vector(query_vector: Any) -> list[float]:
+    if hasattr(query_vector, "tolist"):
+        query_vector = query_vector.tolist()
+    if not isinstance(query_vector, (list, tuple)):
+        raise TypeError("query_vector must be a list/tuple of numbers")
+    try:
+        return [float(v) for v in query_vector]
+    except (TypeError, ValueError) as exc:
+        raise ValueError("query_vector must contain only numeric values") from exc
+
 def search_memory_vectors(
     query_vector: Any, k: int = 5, verbose: bool = False
 ) -> list[dict[str, Any]]:
@@
-    if hasattr(query_vector, "tolist"):
-        query_vector = query_vector.tolist()
+    query_vector = _normalize_query_vector(query_vector)
+    if k <= 0:
+        raise ValueError("k must be > 0")
@@
-    vector_str = str(query_vector)
+    vector_str = "[" + ", ".join(f"{v:.17g}" for v in query_vector) + "]"
@@
 def search_chunk_vectors(
     query_vector: Any, k: int = 5, verbose: bool = False
 ) -> list[dict[str, Any]]:
@@
-    if hasattr(query_vector, "tolist"):
-        query_vector = query_vector.tolist()
+    query_vector = _normalize_query_vector(query_vector)
+    if k <= 0:
+        raise ValueError("k must be > 0")
@@
-    vector_str = str(query_vector)
+    vector_str = "[" + ", ".join(f"{v:.17g}" for v in query_vector) + "]"

Also applies to: 196-219

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@openkl/vector_search.py` around lines 145 - 167, Normalize and sanitize
query_vector and k before interpolating into the Cypher CALL: ensure
query_vector is a sequence (list/tuple/np.ndarray), convert all elements to
floats (reject/raise if any element cannot be cast to float), and serialize the
validated numeric list (e.g., via a safe serializer rather than raw str())
before embedding in the CALL QUERY_VECTOR_INDEX invocation; also validate k is
an integer within acceptable bounds (e.g., >=1 and a sensible upper limit) and
coerce/raise on invalid input. Update the logic around query_vector, vector_str
and the CALL QUERY_VECTOR_INDEX construction in the function that currently uses
query_vector and k (references: query_vector, k, get_connection,
_ensure_vector_extension_loaded, _ensure_vector_indexes_exist, CALL
QUERY_VECTOR_INDEX) and apply the same validation/serialization fixes to the
other occurrence noted (the block around lines 196-219).
openkl/db.py (1)

60-65: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Vector extension failure allows silent downstream breakage.

If the vector extension fails to install/load, the code logs a warning but continues. Downstream vector operations (CREATE_VECTOR_INDEX, QUERY_VECTOR_INDEX) will then fail unexpectedly. Consider either:

  1. Re-raising the exception if vector search is required functionality, or
  2. Setting a module-level flag to gracefully degrade vector features.
Option 1: Fail fast if vector is required
     try:
         conn.execute("INSTALL VECTOR;")
         conn.execute("LOAD VECTOR;")
         logger.info("Vector extension installed and loaded")
     except Exception as e:
-        logger.warning(f"Failed to install vector extension: {e}")
+        logger.error(f"Failed to install vector extension: {e}")
+        raise RuntimeError("Vector extension is required for OpenKL") from e
Option 2: Track capability for graceful degradation
+_vector_available: bool = False
+
 def init_db(db_path: Path | None = None) -> graphdb.Connection:
     ...
+    global _vector_available
     # Install and load vector extension
     try:
         conn.execute("INSTALL VECTOR;")
         conn.execute("LOAD VECTOR;")
         logger.info("Vector extension installed and loaded")
+        _vector_available = True
     except Exception as e:
         logger.warning(f"Failed to install vector extension: {e}")
+        _vector_available = False
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@openkl/db.py` around lines 60 - 65, The current try/except around
conn.execute("INSTALL VECTOR;") / conn.execute("LOAD VECTOR;") swallows failures
which breaks downstream vector operations; update this by either (A) failing
fast: re-raise the caught exception inside the except block (or raise a new
RuntimeError with context) so callers like CREATE_VECTOR_INDEX and
QUERY_VECTOR_INDEX cannot run without the extension, or (B) implement graceful
degradation: introduce a module-level boolean (e.g. VECTOR_AVAILABLE = False)
and set it to True on successful install/load and to False in the except path
while logging the error; then ensure vector-using functions
(CREATE_VECTOR_INDEX, QUERY_VECTOR_INDEX) check VECTOR_AVAILABLE and return a
clear error or no-op when False. Use the existing logger for detailed error
logging and reference conn.execute and the vector-related functions when making
the change.
🧹 Nitpick comments (3)
tests/test_db_backend.py (1)

10-11: ⚡ Quick win

Use next(iter(...)) for single-element extraction.

Per static analysis (RUF015): list(result)[0] materializes the entire result set just to get the first row. Use next(iter(result)) for efficiency and clarity.

Suggested fix
-        result = conn.execute("MATCH (m:MemoryNote) RETURN count(m)")
-        assert list(result)[0][0] == 0
+        result = conn.execute("MATCH (m:MemoryNote) RETURN count(m)")
+        row = next(iter(result))
+        assert row[0] == 0
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_db_backend.py` around lines 10 - 11, The test currently
materializes the whole query result with list(result)[0][0]; change this to use
an iterator to get only the first row: call next(iter(result)) to obtain the
first row (e.g., row = next(iter(result))) and assert row[0] == 0; update the
assertion in tests/test_db_backend.py where conn.execute(...) is assigned to
result to use next(iter(result)) instead of list(result)[0][0].
README.md (1)

124-127: ⚡ Quick win

Consider adding a link to the detailed migration plan.

The migration note provides clear guidance for users with legacy data. For users who need more context or step-by-step instructions, consider linking to docs/plans/2026-05-31-ladybug-migration.md or creating a dedicated migration guide.

📚 Suggested enhancement
 ### LadybugDB Migration
 
-OpenKL now uses LadybugDB instead of the archived KuzuDB package. Existing `~/.ok/kuzu` graph data is treated as a legacy derived index; keep it as a backup and rebuild into `~/.ok/ladybug` before relying on old graph state.
+OpenKL now uses LadybugDB instead of the archived KuzuDB package. Existing `~/.ok/kuzu` graph data is treated as a legacy derived index; keep it as a backup and rebuild into `~/.ok/ladybug` before relying on old graph state.
+
+For detailed migration guidance, see [docs/plans/2026-05-31-ladybug-migration.md](docs/plans/2026-05-31-ladybug-migration.md).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@README.md` around lines 124 - 127, The README's "### LadybugDB Migration"
section lacks a link to the detailed migration plan; update that paragraph to
include a clear hyperlink to the migration document
(docs/plans/2026-05-31-ladybug-migration.md) or add a sentence pointing users to
the dedicated migration guide, referencing the "### LadybugDB Migration" heading
so reviewers can locate and verify the added link quickly.
docs/plans/2026-05-31-ladybug-migration.md (1)

382-392: ⚡ Quick win

Consider resolving open questions before final merge.

The open questions about package management (uv vs PDM), graph rebuild commands, and Python version support are reasonable for a planning document, but they introduce uncertainty for future maintenance. Consider documenting the final decisions made during implementation either in this plan or in a follow-up section.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/plans/2026-05-31-ladybug-migration.md` around lines 382 - 392, The plan
currently leaves three unresolved items under "Open Questions Before Execution"
(package manager choice: "uv" vs PDM, whether to include "ok graph rebuild" in
the first migration, and the supported Python versions showing conflicting
metadata), so update the document to record the final decisions and rationale:
explicitly state whether the repo will retain "uv" or migrate to PDM, confirm
whether "ok graph rebuild" will be part of the initial migration or left as a
manual follow-up, and reconcile the Python version constraints (adjust
classifiers or metadata to match the chosen supported versions); include these
decisions either inline in the "Open Questions Before Execution" section or add
a "Decisions/Follow-up" subsection and reference "Rollback Plan" and "ok graph
rebuild" where relevant so future maintainers have a clear, authoritative action
list.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@openkl/db.py`:
- Line 8: The code imports ladybug as graphdb and currently calls
graphdb.Database(str(db_path)), graphdb.Connection(db) and conn.execute(...);
update the vector-extension and schema error handling so failures are not
silently ignored: when executing the vector extension SQL (the INSTALL/LOAD
VECTOR block using conn.execute), detect if the extension failed and either
raise an exception or set a clear flag (e.g., vector_available) so any
vector-dependent functionality immediately fails or is prevented; for schema
initialization (the block that currently checks error strings for "already
exists"), switch to using SQL DDL with IF NOT EXISTS where possible or catch the
specific LadybugDB exception type instead of substring matching, re-raise
unexpected errors, and ensure you reference graphdb.Database,
graphdb.Connection, conn.execute and the vector init / schema init code paths
when making these changes.

In `@tests/test_vector_search.py`:
- Around line 21-22: Add an explicit non-empty assertion for the retrieval
results before indexing into them: in the test around the call to
search_memory_vectors([0.1] * 384, k=1) (variable results), assert that results
is not empty (e.g., assert len(results) > 0 or assert results) before the
existing assert results[0]["id"] == "m-test" so failures show a clear assertion
message instead of an IndexError.

---

Outside diff comments:
In `@openkl/db.py`:
- Around line 60-65: The current try/except around conn.execute("INSTALL
VECTOR;") / conn.execute("LOAD VECTOR;") swallows failures which breaks
downstream vector operations; update this by either (A) failing fast: re-raise
the caught exception inside the except block (or raise a new RuntimeError with
context) so callers like CREATE_VECTOR_INDEX and QUERY_VECTOR_INDEX cannot run
without the extension, or (B) implement graceful degradation: introduce a
module-level boolean (e.g. VECTOR_AVAILABLE = False) and set it to True on
successful install/load and to False in the except path while logging the error;
then ensure vector-using functions (CREATE_VECTOR_INDEX, QUERY_VECTOR_INDEX)
check VECTOR_AVAILABLE and return a clear error or no-op when False. Use the
existing logger for detailed error logging and reference conn.execute and the
vector-related functions when making the change.

In `@openkl/memory.py`:
- Around line 185-196: The current update builds a Cypher string by
interpolating escaped values (variables: memory_id, escaped_text, tags, updates,
update_query) and calls conn.execute, which is vulnerable; instead construct the
SET clause using parameter placeholders (e.g., m.text = $text_param, m.tags =
$tags_param, m.id = $id_param) and collect a params dict with keys like
text_param, tags_param, id_param (or similar), convert tags to a list value
rather than stringifying, then call conn.execute(update_query, params=params) so
all values are bound as parameters rather than interpolated.

In `@openkl/vector_search.py`:
- Around line 145-167: Normalize and sanitize query_vector and k before
interpolating into the Cypher CALL: ensure query_vector is a sequence
(list/tuple/np.ndarray), convert all elements to floats (reject/raise if any
element cannot be cast to float), and serialize the validated numeric list
(e.g., via a safe serializer rather than raw str()) before embedding in the CALL
QUERY_VECTOR_INDEX invocation; also validate k is an integer within acceptable
bounds (e.g., >=1 and a sensible upper limit) and coerce/raise on invalid input.
Update the logic around query_vector, vector_str and the CALL QUERY_VECTOR_INDEX
construction in the function that currently uses query_vector and k (references:
query_vector, k, get_connection, _ensure_vector_extension_loaded,
_ensure_vector_indexes_exist, CALL QUERY_VECTOR_INDEX) and apply the same
validation/serialization fixes to the other occurrence noted (the block around
lines 196-219).

---

Nitpick comments:
In `@docs/plans/2026-05-31-ladybug-migration.md`:
- Around line 382-392: The plan currently leaves three unresolved items under
"Open Questions Before Execution" (package manager choice: "uv" vs PDM, whether
to include "ok graph rebuild" in the first migration, and the supported Python
versions showing conflicting metadata), so update the document to record the
final decisions and rationale: explicitly state whether the repo will retain
"uv" or migrate to PDM, confirm whether "ok graph rebuild" will be part of the
initial migration or left as a manual follow-up, and reconcile the Python
version constraints (adjust classifiers or metadata to match the chosen
supported versions); include these decisions either inline in the "Open
Questions Before Execution" section or add a "Decisions/Follow-up" subsection
and reference "Rollback Plan" and "ok graph rebuild" where relevant so future
maintainers have a clear, authoritative action list.

In `@README.md`:
- Around line 124-127: The README's "### LadybugDB Migration" section lacks a
link to the detailed migration plan; update that paragraph to include a clear
hyperlink to the migration document (docs/plans/2026-05-31-ladybug-migration.md)
or add a sentence pointing users to the dedicated migration guide, referencing
the "### LadybugDB Migration" heading so reviewers can locate and verify the
added link quickly.

In `@tests/test_db_backend.py`:
- Around line 10-11: The test currently materializes the whole query result with
list(result)[0][0]; change this to use an iterator to get only the first row:
call next(iter(result)) to obtain the first row (e.g., row = next(iter(result)))
and assert row[0] == 0; update the assertion in tests/test_db_backend.py where
conn.execute(...) is assigned to result to use next(iter(result)) instead of
list(result)[0][0].
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c21c04ed-c8bb-4cb2-aa31-5523c0786ccd

📥 Commits

Reviewing files that changed from the base of the PR and between a4ce8a0 and bcdcfda.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (12)
  • README.md
  • docs/plans/2026-05-31-ladybug-migration.md
  • openkl/cli.py
  • openkl/db.py
  • openkl/distill.py
  • openkl/graph.py
  • openkl/memory.py
  • openkl/vector_search.py
  • pyproject.toml
  • rfcs/0000-openkl-design.md
  • tests/test_db_backend.py
  • tests/test_vector_search.py

Comment thread openkl/db.py
Comment thread tests/test_vector_search.py
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
README.md (2)

126-126: Consider hyphenating the compound modifier.

The phrase "legacy derived index" functions as a compound modifier and would be clearer as "legacy-derived index" per standard English hyphenation rules. As per coding guidelines, the static analysis tool flagged this as a potential clarity improvement.

✏️ Optional style fix
-OpenKL now uses LadybugDB instead of the archived KuzuDB package. Existing `~/.ok/kuzu` graph data is treated as a legacy derived index; keep it as a backup and rebuild into `~/.ok/ladybug` before relying on old graph state.
+OpenKL now uses LadybugDB instead of the archived KuzuDB package. Existing `~/.ok/kuzu` graph data is treated as a legacy-derived index; keep it as a backup and rebuild into `~/.ok/ladybug` before relying on old graph state.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@README.md` at line 126, Update the phrase "legacy derived index" to the
hyphenated compound modifier "legacy-derived index" in the README sentence that
mentions OpenKL using LadybugDB and the existing ~/.ok/kuzu graph data so the
wording reads "...treated as a legacy-derived index; keep it as a backup and
rebuild into ~/.ok/ladybug..."—locate the occurrence of "legacy derived index"
and replace it with "legacy-derived index".

124-126: ⚡ Quick win

Clarify the migration guidance with actionable steps.

The current guidance tells users to "rebuild into ~/.ok/ladybug" but doesn't explain how or mention that no automatic migration exists. Users with existing Kuzu data will be confused about what action to take.

Consider adding explicit guidance such as: "No automatic migration is provided; the graph will be rebuilt from your canonical data (files and memory) when you use OpenKL commands. Alternatively, manually re-run your ingestion and memory commands to repopulate the graph."

📝 Suggested improvement
 ### LadybugDB Migration
 
-OpenKL now uses LadybugDB instead of the archived KuzuDB package. Existing `~/.ok/kuzu` graph data is treated as a legacy derived index; keep it as a backup and rebuild into `~/.ok/ladybug` before relying on old graph state.
+OpenKL now uses LadybugDB instead of the archived KuzuDB package. Existing `~/.ok/kuzu` graph data is treated as a legacy derived index; keep it as a backup. The graph will be automatically rebuilt in `~/.ok/ladybug` as you use OpenKL commands. No manual migration is required, but you can also manually re-ingest documents and re-add memory entries to repopulate the graph immediately.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@README.md` around lines 124 - 126, Update the README migration section to
explicitly state there is no automatic migration and provide clear, actionable
steps: mention that existing data in ~/.ok/kuzu is only a legacy backup, explain
that OpenKL will rebuild the LadybugDB graph from canonical sources when you run
OpenKL commands, and add concrete options—either let OpenKL rebuild
automatically by running your usual OpenKL commands or manually re-run your
ingestion and memory commands to repopulate ~/.ok/ladybug; reference the legacy
path (~/.ok/kuzu), the new path (~/.ok/ladybug), and the terms "ingestion" and
"memory" commands so users know what to run.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@README.md`:
- Line 126: Update the phrase "legacy derived index" to the hyphenated compound
modifier "legacy-derived index" in the README sentence that mentions OpenKL
using LadybugDB and the existing ~/.ok/kuzu graph data so the wording reads
"...treated as a legacy-derived index; keep it as a backup and rebuild into
~/.ok/ladybug..."—locate the occurrence of "legacy derived index" and replace it
with "legacy-derived index".
- Around line 124-126: Update the README migration section to explicitly state
there is no automatic migration and provide clear, actionable steps: mention
that existing data in ~/.ok/kuzu is only a legacy backup, explain that OpenKL
will rebuild the LadybugDB graph from canonical sources when you run OpenKL
commands, and add concrete options—either let OpenKL rebuild automatically by
running your usual OpenKL commands or manually re-run your ingestion and memory
commands to repopulate ~/.ok/ladybug; reference the legacy path (~/.ok/kuzu),
the new path (~/.ok/ladybug), and the terms "ingestion" and "memory" commands so
users know what to run.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 16b7b159-5e20-444f-8bab-e0651b762edb

📥 Commits

Reviewing files that changed from the base of the PR and between bcdcfda and da193ac.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (11)
  • README.md
  • openkl/cli.py
  • openkl/db.py
  • openkl/distill.py
  • openkl/graph.py
  • openkl/memory.py
  • openkl/vector_search.py
  • pyproject.toml
  • rfcs/0000-openkl-design.md
  • tests/test_db_backend.py
  • tests/test_vector_search.py
✅ Files skipped from review due to trivial changes (2)
  • rfcs/0000-openkl-design.md
  • openkl/distill.py
🚧 Files skipped from review as they are similar to previous changes (6)
  • pyproject.toml
  • tests/test_vector_search.py
  • openkl/cli.py
  • openkl/memory.py
  • openkl/vector_search.py
  • openkl/db.py

Address the actionable CodeRabbit review feedback on the LadybugDB migration.\n\nFail fast when the required LadybugDB VECTOR extension cannot be installed or loaded so OpenKL does not continue with a partially functional graph backend. Make schema creation idempotent with IF NOT EXISTS instead of relying on substring checks against exception messages.\n\nParameterize the MemoryManager.update existence check and SET clause so memory text and tags are passed through LadybugDB parameters instead of interpolated Cypher fragments. Validate vector-search inputs before touching the database, constrain k to a bounded positive integer, and serialize only finite numeric vector values into QUERY_VECTOR_INDEX calls.\n\nTighten tests around these behaviors: vector-extension failure now raises, vector search rejects invalid vectors and k before DB access, memory updates handle quoted text and tags through parameters, and the vector-search round trip asserts a non-empty result before indexing.\n\nVerification:\n- uv run pytest -q\n- uv run ruff check . && uv run ruff format --check .\n- uv run mypy openkl/db.py openkl/graph.py openkl/vector_search.py\n- HOME="/var/folders/vf/8zspxjpd00l_hmmb95ygld940000gn/T/tmp.fIREvsujDR" uv run ok doctor
@pi-dal
Copy link
Copy Markdown
Author

pi-dal commented Jun 1, 2026

Reviewed the remaining CodeRabbit items against the current PR diff and addressed the actionable ones in 00b499f.

What changed:

  • openkl/db.py: VECTOR extension setup now fails fast instead of logging and continuing, and schema DDL now uses IF NOT EXISTS rather than matching already exists exception text.
  • openkl/memory.py: MemoryManager.update() now binds memory_id, text, and tags as query parameters instead of interpolating update values into Cypher.
  • openkl/vector_search.py: query vectors and k are validated before database access; only finite numeric vectors are serialized into QUERY_VECTOR_INDEX, and k is constrained to a bounded positive integer.
  • Tests: added coverage for vector-extension failure, invalid vector-search inputs before DB access, parameterized memory updates with quoted text/tags, the next(iter(...)) schema smoke style, and the explicit non-empty assertion before indexing vector-search results.

The README/docs-plan suggestions are intentionally not applied. The migration plan document was removed from this PR per maintainer direction, so adding a README link to a non-committed docs/plans/... file would leave a broken reference. :)

Verification after the fix:

  • uv run pytest -q
  • uv run ruff check . && uv run ruff format --check .
  • uv run mypy openkl/db.py openkl/graph.py openkl/vector_search.py
  • isolated HOME smoke with uv run ok doctor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant