Skip to content

Conversation

@ketanjain3
Copy link

@ketanjain3 ketanjain3 commented Dec 2, 2025

  • Add optional connect_args parameter to init method
  • Support DATABASE_CONNECT_ARGS environment variable for JSON-based configuration
  • Enable custom connection parameters for all database engines (SQLite and PostgreSQL)
  • Maintain backward compatibility with existing code
  • Add proper error handling and validation for environment variable parsing

Description

The intent of this PR is to make the database initialization more flexible and configurable. In order to do this, the system will support a new DATABASE_CONNECT_ARGS environment variable that takes JSON-based configuration,. This enhancement will allow custom connection parameters to be passed to any supported database engine, including SQLite and PostgreSQL,. To guarantee that the environment variable is parsed securely and consistently, appropriate error handling and validation will also be added.

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Code refactoring
  • Performance improvement
  • Other (please specify):

Screenshots/Videos (if applicable)

Pre-submission Checklist

  • I have tested my changes thoroughly before submitting this PR
  • This PR contains minimal changes necessary to address the issue/feature
  • My code follows the project's coding standards and style guidelines
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if applicable)
  • All new and existing tests pass
  • I have searched existing PRs to ensure this change hasn't been submitted already
  • I have linked any relevant issues in the description
  • My commits have clear and descriptive messages

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

Summary by CodeRabbit

  • New Features

    • Advanced database connection configuration through the optional DATABASE_CONNECT_ARGS environment variable, supporting custom settings such as SSL certificates and timeout configurations.
    • Custom connection arguments can now be passed to relational database adapters.
  • Tests

    • Comprehensive unit test suite for database connection argument parsing and validation.

✏️ Tip: You can customize this high-level summary in your review settings.

- Add optional connect_args parameter to __init__ method
- Support DATABASE_CONNECT_ARGS environment variable for JSON-based configuration
- Enable custom connection parameters for all database engines (SQLite and PostgreSQL)
- Maintain backward compatibility with existing code
- Add proper error handling and validation for environment variable parsing

Signed-off-by: ketanjain7981 <[email protected]>
@pull-checklist
Copy link

pull-checklist bot commented Dec 2, 2025

Please make sure all the checkboxes are checked:

  • I have tested these changes locally.
  • I have reviewed the code changes.
  • I have added end-to-end and unit tests (if applicable).
  • I have updated the documentation and README.md file (if necessary).
  • I have removed unnecessary code and debug statements.
  • PR title is clear and follows the convention.
  • I have tagged reviewers or team members for feedback.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 2, 2025

Walkthrough

Introduces optional DATABASE_CONNECT_ARGS support for configurable database connection parameters. Adds environment variable to template, extends RelationalConfig with JSON parsing capability, updates SQLAlchemyAdapter to accept and use connect_args, propagates parameter through create_relational_engine, and includes comprehensive unit tests.

Changes

Cohort / File(s) Change Summary
Configuration & Environment
\.env\.template, cognee/infrastructure/databases/relational/config.py
Adds DATABASE_CONNECT_ARGS optional variable to environment template with PostgreSQL SSL and SQLite timeout examples. Introduces database_connect_args field to RelationalConfig, parses JSON-formatted string to dict, and includes parsed value in to_dict output.
Database Adapter
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
Adds optional connect_args parameter to init method. Merges provided connect_args with static timeout settings for SQLite, or passes directly to create_async_engine for non-SQLite connections. Updates constructor docstring.
Engine Creation
cognee/infrastructure/databases/relational/create_relational_engine.py
Adds database_connect_args optional parameter and propagates it through SQLAlchemyAdapter instantiation via connect_args argument. Updates documentation strings.
Unit Tests
cognee/tests/unit/infrastructure/databases/relational/test_RelationalConfig.py
Adds comprehensive test suite covering JSON dict parsing, empty string handling, missing variable fallback, invalid JSON error handling, non-dict JSON conversion, to_dict output validation, integer value parsing, and mixed type support.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

  • JSON parsing logic in RelationalConfig: Verify error handling and fallback behavior for invalid JSON in fill_derived method
  • Connect args merging in SQLAlchemyAdapter: Confirm proper merging of user-provided connect_args with built-in timeout settings for SQLite, and correct parameter passing for other database providers
  • Parameter propagation chain: Ensure connect_args flows correctly from environment → RelationalConfig → create_relational_engine → SQLAlchemyAdapter
  • Test coverage: Verify all test cases properly validate parsing behavior, type conversions, and edge cases

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding connect_args support to SqlAlchemyAdapter, which is the core feature across all modified files.
Description check ✅ Passed The description covers the main objectives, includes the required template sections (Description, Type of Change, Pre-submission Checklist, DCO Affirmation), and provides clear context about the feature.
Docstring Coverage ✅ Passed Docstring coverage is 91.67% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @ketanjain3, thank you for submitting a PR! We will respond as soon as possible.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (1)

68-73: User-provided timeout is always overridden.

The current spread order {**(connect_args or {}), **{"timeout": 30}} means the hardcoded timeout=30 always overwrites any user-specified timeout. This contradicts the .env.template example showing DATABASE_CONNECT_ARGS='{"timeout": 60}'.

Reverse the merge order so user-provided values take precedence over the default:

         if "sqlite" in connection_string:
             self.engine = create_async_engine(
                 connection_string,
                 poolclass=NullPool,
-                connect_args={**(connect_args or {}), **{"timeout": 30}},
+                connect_args={"timeout": 30, **(connect_args or {})},
             )
🧹 Nitpick comments (2)
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (2)

37-50: Consider security implications of logging env_connect_args.

Line 47 logs the parsed env_connect_args value, which could potentially contain sensitive connection parameters (e.g., SSL certificates, passwords in some edge cases). Consider logging a generic message without the actual values.

Additionally, the current merge order (connect_args.update(env_connect_args)) means environment variables override constructor arguments. Verify this is the intended precedence—typically, explicit constructor arguments take precedence over environment defaults.

                 if isinstance(env_connect_args, dict):
                     if connect_args is None:
                         connect_args = {}
-                    connect_args.update(env_connect_args)
+                    # Merge env args first, then constructor args override
+                    merged = {**env_connect_args, **(connect_args or {})}
+                    connect_args = merged
                 else:
                     logger.warning(
-                        f"DATABASE_CONNECT_ARGS is not a valid JSON dictionary: {env_connect_args}"
+                        "DATABASE_CONNECT_ARGS is not a valid JSON dictionary"
                     )

33-35: Add docstring for __init__ method.

Per coding guidelines, the new connect_args parameter should be documented. Consider adding a docstring:

     def __init__(self, connection_string: str, connect_args: Optional[dict] = None):
+        """
+        Initialize the SQLAlchemy adapter with the given connection string.
+
+        Parameters:
+        -----------
+            - connection_string (str): The database connection URI.
+            - connect_args (Optional[dict]): Additional connection arguments passed
+              to the SQLAlchemy engine. Can also be set via DATABASE_CONNECT_ARGS
+              environment variable (JSON format). Environment values are merged
+              with constructor values.
+        """
         self.db_path: str = None
         self.db_uri: str = connection_string
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2d45db9 and f9b16e5.

📒 Files selected for processing (2)
  • .env.template (1 hunks)
  • cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Use 4-space indentation in Python code
Use snake_case for Python module and function names
Use PascalCase for Python class names
Use ruff format before committing Python code
Use ruff check for import hygiene and style enforcement with line-length 100 configured in pyproject.toml
Prefer explicit, structured error handling in Python code

Files:

  • cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py

⚙️ CodeRabbit configuration file

**/*.py: When reviewing Python code for this project:

  1. Prioritize portability over clarity, especially when dealing with cross-Python compatibility. However, with the priority in mind, do still consider improvements to clarity when relevant.
  2. As a general guideline, consider the code style advocated in the PEP 8 standard (excluding the use of spaces for indentation) and evaluate suggested changes for code style compliance.
  3. As a style convention, consider the code style advocated in CEP-8 and evaluate suggested changes for code style compliance.
  4. As a general guideline, try to provide any relevant, official, and supporting documentation links to any tool's suggestions in review comments. This guideline is important for posterity.
  5. As a general rule, undocumented function definitions and class definitions in the project's Python code are assumed incomplete. Please consider suggesting a short summary of the code for any of these incomplete definitions as docstrings when reviewing.

Files:

  • cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
cognee/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use shared logging utilities from cognee.shared.logging_utils in Python code

Files:

  • cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
cognee/{modules,infrastructure,tasks}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Co-locate feature-specific helpers under their respective package (modules/, infrastructure/, or tasks/)

Files:

  • cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
🧬 Code graph analysis (1)
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (1)
cognee/shared/logging_utils.py (1)
  • warning (206-206)
🔇 Additional comments (4)
.env.template (1)

94-102: Well-documented environment variable addition.

The documentation clearly explains the purpose and provides practical examples for both PostgreSQL (SSL configuration) and SQLite (timeout). The placement within the relational database settings section is appropriate.

cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (3)

6-6: LGTM!

Standard library import for JSON parsing is correctly added.


33-33: Backward-compatible signature change.

The optional parameter with None default maintains compatibility with existing code.


74-83: LGTM!

The non-SQLite engine creation correctly passes through the merged connect_args with an empty dict fallback.

- Add comprehensive docstring for __init__ method to meet 80% coverage requirement
- Fix security issue: remove sensitive data from log messages
- Fix merge precedence: programmatic args now correctly override env vars
- Fix SQLite timeout order: user-specified timeout now overrides default 30s
- Clarify precedence in docstring documentation

Signed-off-by: ketanjain7981 <[email protected]>
@ketanjain3 ketanjain3 force-pushed the feature/sqlalchemy-custom-connect-args branch from 7d6ae6a to c892265 Compare December 2, 2025 15:32
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (2)

33-51: Excellent docstring and signature design.

The constructor signature maintains backward compatibility, and the docstring is comprehensive, clearly documenting parameters, environment variables, and examples. This follows the coding guidelines for documenting function definitions.

Consider adding runtime validation for the programmatic connect_args parameter to ensure it's a dictionary if provided:

         self.db_path: str = None
         self.db_uri: str = connection_string
 
+        if connect_args is not None and not isinstance(connect_args, dict):
+            raise TypeError("connect_args must be a dictionary")
+
         env_connect_args = os.getenv("DATABASE_CONNECT_ARGS")

This would provide clearer error messages if incorrect types are passed, though SQLAlchemy would also catch this during engine creation.


55-66: Good environment variable parsing with appropriate error handling.

The logic correctly:

  • Parses the DATABASE_CONNECT_ARGS environment variable as JSON
  • Validates the parsed value is a dictionary
  • Merges environment-based args with programmatic args (programmatic takes precedence)
  • Uses non-fatal warnings to maintain backward compatibility

For improved debugging, consider making the warning messages more informative:

                 if isinstance(parsed_env_args, dict):
                     # Merge: env vars as base, programmatic args override
                     merged_args = {**parsed_env_args, **(connect_args or {})}
                     connect_args = merged_args
                 else:
-                    logger.warning("DATABASE_CONNECT_ARGS is not a valid JSON dictionary, ignoring")
+                    logger.warning(
+                        f"DATABASE_CONNECT_ARGS is not a valid JSON dictionary (got {type(parsed_env_args).__name__}), ignoring"
+                    )
             except json.JSONDecodeError:
-                logger.warning("Failed to parse DATABASE_CONNECT_ARGS as JSON, ignoring")
+                logger.warning(f"Failed to parse DATABASE_CONNECT_ARGS as JSON: {str(e)}, ignoring")
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f9b16e5 and c892265.

📒 Files selected for processing (1)
  • cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Use 4-space indentation in Python code
Use snake_case for Python module and function names
Use PascalCase for Python class names
Use ruff format before committing Python code
Use ruff check for import hygiene and style enforcement with line-length 100 configured in pyproject.toml
Prefer explicit, structured error handling in Python code

Files:

  • cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py

⚙️ CodeRabbit configuration file

**/*.py: When reviewing Python code for this project:

  1. Prioritize portability over clarity, especially when dealing with cross-Python compatibility. However, with the priority in mind, do still consider improvements to clarity when relevant.
  2. As a general guideline, consider the code style advocated in the PEP 8 standard (excluding the use of spaces for indentation) and evaluate suggested changes for code style compliance.
  3. As a style convention, consider the code style advocated in CEP-8 and evaluate suggested changes for code style compliance.
  4. As a general guideline, try to provide any relevant, official, and supporting documentation links to any tool's suggestions in review comments. This guideline is important for posterity.
  5. As a general rule, undocumented function definitions and class definitions in the project's Python code are assumed incomplete. Please consider suggesting a short summary of the code for any of these incomplete definitions as docstrings when reviewing.

Files:

  • cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
cognee/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use shared logging utilities from cognee.shared.logging_utils in Python code

Files:

  • cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
cognee/{modules,infrastructure,tasks}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Co-locate feature-specific helpers under their respective package (modules/, infrastructure/, or tasks/)

Files:

  • cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
🧬 Code graph analysis (1)
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (1)
cognee/shared/logging_utils.py (1)
  • warning (206-206)
🔇 Additional comments (3)
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (3)

6-6: LGTM! Safe JSON import.

The json import is necessary for parsing the DATABASE_CONNECT_ARGS environment variable and is correctly placed with other standard library imports.


84-89: LGTM! Correct SQLite connect_args handling.

The merge strategy {**{"timeout": 30}, **(connect_args or {})} correctly:

  • Provides a sensible default timeout of 30 seconds for SQLite
  • Allows user-provided connect_args to override the default
  • Handles None with the or {} fallback

91-99: LGTM! Correct non-SQLite connect_args handling.

The code correctly passes connect_args or {} to create_async_engine, ensuring an empty dictionary is provided when connect_args is None.

- Remove unused connect_args parameter from __init__
- Programmatic parameter was dead code (never called by users)
- Users call get_relational_engine() which doesn't expose connect_args
- Keep DATABASE_CONNECT_ARGS env var support (actually used in production)
- Simplify implementation and reduce complexity
- Update docstring to reflect env-var-only approach
- Add production examples to docstring

Signed-off-by: ketanjain7981 <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (2)

33-57: Constructor docstring/env docs look good, but double‑check alignment with PR intent.

The expanded __init__ docstring clearly explains DATABASE_CONNECT_ARGS, examples, and intended usage, and it matches the actual behavior of this class (env‑driven configuration only).

However, the PR description mentions adding an optional connect_args parameter to SQLAlchemyAdapter.__init__, but the current signature still only accepts connection_string: str. If you still intend to support programmatic connect_args (in addition to the env var), consider either:

  • Updating the signature (e.g., def __init__(self, connection_string: str, connect_args: dict | None = None)) and documenting the precedence between env vs code, or
  • Adjusting the PR description to reflect that only env‑based configuration is supported.

61-73: Env‑driven DATABASE_CONNECT_ARGS parsing is robust; consider logging a bit more detail.

The JSON parsing and type check (must be a dict) are good defensive choices: invalid JSON or non‑dict values are safely ignored with warnings instead of breaking startup.

If you want slightly better diagnosability, you could:

  • Include the exception message in the JSON parse failure warning, and/or
  • Use exc_info=True so operators can see stack traces when needed.

For example:

-            except json.JSONDecodeError:
-                logger.warning("Failed to parse DATABASE_CONNECT_ARGS as JSON, ignoring")
+            except json.JSONDecodeError as exc:
+                logger.warning(
+                    "Failed to parse DATABASE_CONNECT_ARGS as JSON, ignoring: %s", exc
+                )

This keeps behavior identical while making misconfiguration easier to debug.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c892265 and 3f53534.

📒 Files selected for processing (1)
  • cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Use 4-space indentation in Python code
Use snake_case for Python module and function names
Use PascalCase for Python class names
Use ruff format before committing Python code
Use ruff check for import hygiene and style enforcement with line-length 100 configured in pyproject.toml
Prefer explicit, structured error handling in Python code

Files:

  • cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py

⚙️ CodeRabbit configuration file

**/*.py: When reviewing Python code for this project:

  1. Prioritize portability over clarity, especially when dealing with cross-Python compatibility. However, with the priority in mind, do still consider improvements to clarity when relevant.
  2. As a general guideline, consider the code style advocated in the PEP 8 standard (excluding the use of spaces for indentation) and evaluate suggested changes for code style compliance.
  3. As a style convention, consider the code style advocated in CEP-8 and evaluate suggested changes for code style compliance.
  4. As a general guideline, try to provide any relevant, official, and supporting documentation links to any tool's suggestions in review comments. This guideline is important for posterity.
  5. As a general rule, undocumented function definitions and class definitions in the project's Python code are assumed incomplete. Please consider suggesting a short summary of the code for any of these incomplete definitions as docstrings when reviewing.

Files:

  • cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
cognee/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use shared logging utilities from cognee.shared.logging_utils in Python code

Files:

  • cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
cognee/{modules,infrastructure,tasks}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Co-locate feature-specific helpers under their respective package (modules/, infrastructure/, or tasks/)

Files:

  • cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
🧬 Code graph analysis (1)
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (1)
cognee/shared/logging_utils.py (1)
  • warning (206-206)
🔇 Additional comments (2)
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (2)

6-6: json import is appropriate and scoped.

The added json import cleanly supports the new DATABASE_CONNECT_ARGS parsing with no unused-import concerns in this file.


90-105: connect_args plumbing into create_async_engine looks correct; clarify precedence if you later add a parameter.

For SQLite, the merge

connect_args={**{"timeout": 30}, **(connect_args or {})}

correctly ensures:

  • A sane default timeout=30, and
  • Any value from DATABASE_CONNECT_ARGS overrides that default on conflict.

For non‑SQLite, passing connect_args=connect_args or {} is also consistent with SQLAlchemy’s create_async_engine expectations.

If you later introduce a programmatic connect_args parameter on __init__, it would be worth explicitly defining precedence (e.g., “code wins over env”) and adjusting the merge order accordingly, e.g.:

effective_connect_args = {**(env_connect_args or {}), **(init_connect_args or {})}

So callers are never surprised by env vars silently overriding explicit arguments.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
cognee/tests/unit/infrastructure/databases/relational/sqlalchemy/test_SqlAlchemyAdapter.py (3)

11-12: Make os.getenv mocking more targeted and future‑proof

Each test globally patches os.getenv with a single return_value, so every env lookup inside SqlAlchemyAdapter.__init__ sees the same value. If the adapter ever reads additional env vars (e.g., for S3, timeouts, or other settings), these tests could become brittle or mask bugs.

Consider instead:

  • Using patch.dict("os.environ", {"DATABASE_CONNECT_ARGS": "<json>"}, clear=False) to only affect the variable under test, or
  • Giving os.getenv a side_effect that returns the JSON only when key == "DATABASE_CONNECT_ARGS" and otherwise delegates to the real os.getenv.

Example sketch for one test:

from os import getenv as real_getenv

@patch(
    "cognee.infrastructure.databases.relational.sqlalchemy.SqlAlchemyAdapter.create_async_engine"
)
@patch("os.getenv")
def test_sqlite_with_env_var_timeout(self, mock_getenv, mock_create_engine):
    mock_getenv.side_effect = (
        lambda key, default=None: '{"timeout": 60}'
        if key == "DATABASE_CONNECT_ARGS"
        else real_getenv(key, default)
    )
    ...

This keeps the setup tightly focused on DATABASE_CONNECT_ARGS and resilient to new env‑driven behavior in the adapter. As per coding guidelines, this improves clarity and portability of env‑dependent tests.

Also applies to: 24-25, 37-38, 50-51, 63-64, 76-77


38-46: Optionally extend tests for non‑dict JSON and explicit connect_args precedence

The adapter’s description mentions:

  • Ignoring DATABASE_CONNECT_ARGS when the parsed JSON is not a dict, and
  • Merging env connect_args with explicit connect_args passed to SQLAlchemyAdapter, with programmatic values taking precedence.

You already cover invalid JSON and basic merging for SQLite and env‑only behavior for PostgreSQL. Adding small tests for:

  • A valid but non‑dict JSON (e.g., "[]" or "\"foo\""), and
  • A call that passes connect_args={...} while DATABASE_CONNECT_ARGS is set,

would lock in those extra branches and the precedence guarantee.

Also applies to: 51-58, 64-84


7-7: Add a short docstring for TestSqlAlchemyAdapter

Project guidelines treat undocumented classes as incomplete. A one‑line class‑level docstring (e.g., summarizing that it verifies env‑driven connect_args handling for SQLAlchemyAdapter) would bring this test module fully in line with the stated conventions. As per coding guidelines, adding concise docstrings is preferred.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3f53534 and 4f3a1bc.

📒 Files selected for processing (1)
  • cognee/tests/unit/infrastructure/databases/relational/sqlalchemy/test_SqlAlchemyAdapter.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Use 4-space indentation in Python code
Use snake_case for Python module and function names
Use PascalCase for Python class names
Use ruff format before committing Python code
Use ruff check for import hygiene and style enforcement with line-length 100 configured in pyproject.toml
Prefer explicit, structured error handling in Python code

Files:

  • cognee/tests/unit/infrastructure/databases/relational/sqlalchemy/test_SqlAlchemyAdapter.py

⚙️ CodeRabbit configuration file

**/*.py: When reviewing Python code for this project:

  1. Prioritize portability over clarity, especially when dealing with cross-Python compatibility. However, with the priority in mind, do still consider improvements to clarity when relevant.
  2. As a general guideline, consider the code style advocated in the PEP 8 standard (excluding the use of spaces for indentation) and evaluate suggested changes for code style compliance.
  3. As a style convention, consider the code style advocated in CEP-8 and evaluate suggested changes for code style compliance.
  4. As a general guideline, try to provide any relevant, official, and supporting documentation links to any tool's suggestions in review comments. This guideline is important for posterity.
  5. As a general rule, undocumented function definitions and class definitions in the project's Python code are assumed incomplete. Please consider suggesting a short summary of the code for any of these incomplete definitions as docstrings when reviewing.

Files:

  • cognee/tests/unit/infrastructure/databases/relational/sqlalchemy/test_SqlAlchemyAdapter.py
cognee/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use shared logging utilities from cognee.shared.logging_utils in Python code

Files:

  • cognee/tests/unit/infrastructure/databases/relational/sqlalchemy/test_SqlAlchemyAdapter.py
cognee/tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

cognee/tests/**/*.py: Place Python tests under cognee/tests/ organized by type (unit, integration, cli_tests)
Name Python test files test_*.py and use pytest.mark.asyncio for async tests

Files:

  • cognee/tests/unit/infrastructure/databases/relational/sqlalchemy/test_SqlAlchemyAdapter.py
🔇 Additional comments (1)
cognee/tests/unit/infrastructure/databases/relational/sqlalchemy/test_SqlAlchemyAdapter.py (1)

7-84: Good coverage of env‑driven connect_args behavior

These tests exercise the key branches for SQLite and PostgreSQL: default timeout, env‑overridden timeout, merged extra args, invalid JSON fallback, and presence/absence of env‑provided connect_args. Structure and assertions against create_async_engine kwargs look consistent with the adapter’s described behavior.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (2)

33-33: Use Optional[dict] for parameters with None default.

Per PEP 484, parameters with a default value of None should use Optional[T] rather than T = None.

Apply this diff:

-    def __init__(self, connection_string: str, connect_args: dict = None):
+    def __init__(self, connection_string: str, connect_args: Optional[dict] = None):

63-74: Enhance error logging with exception details and context.

The current warnings silently ignore parsing errors without providing diagnostic information. Including the exception details and the actual environment variable value would aid troubleshooting.

Apply this diff to improve error logging:

         env_connect_args = os.getenv("DATABASE_CONNECT_ARGS")
         if env_connect_args:
             try:
                 parsed_args = json.loads(env_connect_args)
                 if isinstance(parsed_args, dict):
                     env_connect_args_dict = parsed_args
                 else:
-                    logger.warning("DATABASE_CONNECT_ARGS is not a valid JSON dictionary, ignoring")
+                    logger.warning(
+                        f"DATABASE_CONNECT_ARGS is not a valid JSON dictionary (got {type(parsed_args).__name__}), ignoring"
+                    )
             except json.JSONDecodeError:
-                logger.warning("Failed to parse DATABASE_CONNECT_ARGS as JSON, ignoring")
+                logger.warning(
+                    f"Failed to parse DATABASE_CONNECT_ARGS as JSON, ignoring",
+                    exc_info=True
+                )

This follows the coding guideline: "Prefer explicit, structured error handling in Python code."

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a7da9c7 and f26b490.

📒 Files selected for processing (2)
  • cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (4 hunks)
  • cognee/tests/unit/infrastructure/databases/relational/sqlalchemy/test_SqlAlchemyAdapter.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • cognee/tests/unit/infrastructure/databases/relational/sqlalchemy/test_SqlAlchemyAdapter.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Use 4-space indentation in Python code
Use snake_case for Python module and function names
Use PascalCase for Python class names
Use ruff format before committing Python code
Use ruff check for import hygiene and style enforcement with line-length 100 configured in pyproject.toml
Prefer explicit, structured error handling in Python code

Files:

  • cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py

⚙️ CodeRabbit configuration file

**/*.py: When reviewing Python code for this project:

  1. Prioritize portability over clarity, especially when dealing with cross-Python compatibility. However, with the priority in mind, do still consider improvements to clarity when relevant.
  2. As a general guideline, consider the code style advocated in the PEP 8 standard (excluding the use of spaces for indentation) and evaluate suggested changes for code style compliance.
  3. As a style convention, consider the code style advocated in CEP-8 and evaluate suggested changes for code style compliance.
  4. As a general guideline, try to provide any relevant, official, and supporting documentation links to any tool's suggestions in review comments. This guideline is important for posterity.
  5. As a general rule, undocumented function definitions and class definitions in the project's Python code are assumed incomplete. Please consider suggesting a short summary of the code for any of these incomplete definitions as docstrings when reviewing.

Files:

  • cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
cognee/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use shared logging utilities from cognee.shared.logging_utils in Python code

Files:

  • cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
cognee/{modules,infrastructure,tasks}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Co-locate feature-specific helpers under their respective package (modules/, infrastructure/, or tasks/)

Files:

  • cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
🧬 Code graph analysis (1)
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (1)
cognee/shared/logging_utils.py (1)
  • warning (206-206)
🔇 Additional comments (4)
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (4)

6-6: LGTM!

The json import is appropriately added to support parsing the DATABASE_CONNECT_ARGS environment variable.


34-59: LGTM!

The docstring is comprehensive and well-structured, with clear parameter descriptions, environment variable documentation, and practical examples for both PostgreSQL and SQLite configurations.


77-77: LGTM!

The merge logic correctly implements the documented precedence, where explicit constructor arguments override environment variable settings.


95-110: LGTM!

The connect_args application is correctly implemented:

  • SQLite receives a sensible default timeout of 30 seconds that can be overridden via final_connect_args
  • Non-SQLite databases use final_connect_args directly
  • The implementation maintains backward compatibility while enabling customization

@ketanjain3 ketanjain3 requested a review from siillee December 9, 2025 04:48
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
cognee/infrastructure/databases/relational/config.py (1)

43-44: Consider logging parse failures for observability.

Silent fallback to {} on JSONDecodeError or non-dict JSON might hide configuration issues. Users may be unaware their DATABASE_CONNECT_ARGS is malformed.

As per coding guidelines, use shared logging utilities from cognee.shared.logging_utils. Apply this diff:

+from cognee.shared.logging_utils import get_logger
+
+logger = get_logger()
+
 # In fill_derived method:
         try:
             parsed_args = json.loads(self.database_connect_args)
             if isinstance(parsed_args, dict):
                 self.database_connect_args = parsed_args
             else:
+                logger.warning(
+                    f"DATABASE_CONNECT_ARGS must be a JSON object, got {type(parsed_args).__name__}. Using empty dict."
+                )
                 self.database_connect_args = {}
         except json.JSONDecodeError:
+            logger.warning(
+                f"Failed to parse DATABASE_CONNECT_ARGS as JSON: {self.database_connect_args}. Using empty dict."
+            )
             self.database_connect_args = {}
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f26b490 and e1d313a.

📒 Files selected for processing (4)
  • cognee/infrastructure/databases/relational/config.py (5 hunks)
  • cognee/infrastructure/databases/relational/create_relational_engine.py (3 hunks)
  • cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (3 hunks)
  • cognee/tests/unit/infrastructure/databases/relational/test_RelationalConfig.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Use 4-space indentation in Python code
Use snake_case for Python module and function names
Use PascalCase for Python class names
Use ruff format before committing Python code
Use ruff check for import hygiene and style enforcement with line-length 100 configured in pyproject.toml
Prefer explicit, structured error handling in Python code

Files:

  • cognee/infrastructure/databases/relational/config.py
  • cognee/infrastructure/databases/relational/create_relational_engine.py
  • cognee/tests/unit/infrastructure/databases/relational/test_RelationalConfig.py
  • cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py

⚙️ CodeRabbit configuration file

**/*.py: When reviewing Python code for this project:

  1. Prioritize portability over clarity, especially when dealing with cross-Python compatibility. However, with the priority in mind, do still consider improvements to clarity when relevant.
  2. As a general guideline, consider the code style advocated in the PEP 8 standard (excluding the use of spaces for indentation) and evaluate suggested changes for code style compliance.
  3. As a style convention, consider the code style advocated in CEP-8 and evaluate suggested changes for code style compliance.
  4. As a general guideline, try to provide any relevant, official, and supporting documentation links to any tool's suggestions in review comments. This guideline is important for posterity.
  5. As a general rule, undocumented function definitions and class definitions in the project's Python code are assumed incomplete. Please consider suggesting a short summary of the code for any of these incomplete definitions as docstrings when reviewing.

Files:

  • cognee/infrastructure/databases/relational/config.py
  • cognee/infrastructure/databases/relational/create_relational_engine.py
  • cognee/tests/unit/infrastructure/databases/relational/test_RelationalConfig.py
  • cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
cognee/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use shared logging utilities from cognee.shared.logging_utils in Python code

Files:

  • cognee/infrastructure/databases/relational/config.py
  • cognee/infrastructure/databases/relational/create_relational_engine.py
  • cognee/tests/unit/infrastructure/databases/relational/test_RelationalConfig.py
  • cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
cognee/{modules,infrastructure,tasks}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Co-locate feature-specific helpers under their respective package (modules/, infrastructure/, or tasks/)

Files:

  • cognee/infrastructure/databases/relational/config.py
  • cognee/infrastructure/databases/relational/create_relational_engine.py
  • cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
cognee/tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

cognee/tests/**/*.py: Place Python tests under cognee/tests/ organized by type (unit, integration, cli_tests)
Name Python test files test_*.py and use pytest.mark.asyncio for async tests

Files:

  • cognee/tests/unit/infrastructure/databases/relational/test_RelationalConfig.py
🧬 Code graph analysis (2)
cognee/infrastructure/databases/relational/create_relational_engine.py (1)
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (1)
  • SQLAlchemyAdapter (26-672)
cognee/tests/unit/infrastructure/databases/relational/test_RelationalConfig.py (1)
cognee/infrastructure/databases/relational/config.py (3)
  • RelationalConfig (11-68)
  • to_dict (48-68)
  • to_dict (116-133)
🔇 Additional comments (2)
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (1)

54-88: LGTM! Correct handling of connect_args with proper precedence.

The implementation correctly:

  • Falls back to an empty dict when connect_args is None
  • For SQLite, merges user-provided args with default timeout, with user values taking precedence (line 77: {**{"timeout": 30}, **final_connect_args})
  • For non-SQLite databases, passes the args directly to the engine

The spread operator ordering ensures user-provided connection arguments override defaults, which is the correct behavior.

cognee/tests/unit/infrastructure/databases/relational/test_RelationalConfig.py (1)

1-69: Excellent test coverage for edge cases!

The test suite comprehensively covers:

  • Valid JSON dict parsing
  • Empty string handling
  • Missing environment variable
  • Invalid/malformed JSON
  • Non-dict JSON (arrays)
  • Inclusion in to_dict() output
  • Integer and mixed-type values

This provides strong validation of the parsing logic and ensures backward compatibility.

Note: The test at lines 17-21 expects empty strings to remain as "" rather than being converted to None or {}. If you address the inconsistent empty string handling suggested in config.py (lines 36-44), you'll need to update this test accordingly.

db_username: Union[str, None] = None # "cognee"
db_password: Union[str, None] = None # "cognee"
db_provider: str = "sqlite"
database_connect_args: Union[str, None] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix type annotation to reflect runtime types.

The field database_connect_args is declared as Union[str, None], but after the fill_derived validator runs (lines 36-44), it can also be a dict. This creates a type inconsistency.

Apply this diff to fix the type annotation:

-    database_connect_args: Union[str, None] = None
+    database_connect_args: Union[dict, str, None] = None
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
database_connect_args: Union[str, None] = None
database_connect_args: Union[dict, str, None] = None
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/relational/config.py around line 23 (and
refer to the fill_derived validator at lines 36-44), the field
database_connect_args is annotated as Union[str, None] but at runtime the
validator can set it to a dict; update the type annotation to reflect
Optional[Union[str, Dict[str, Any]]] (or Optional[Union[str, dict]]) and add the
necessary typing imports (Dict and Any or use dict) so static types match
runtime values; ensure any references or mypy checks are adjusted accordingly.

Comment on lines +36 to +44
if self.database_connect_args and isinstance(self.database_connect_args, str):
try:
parsed_args = json.loads(self.database_connect_args)
if isinstance(parsed_args, dict):
self.database_connect_args = parsed_args
else:
self.database_connect_args = {}
except json.JSONDecodeError:
self.database_connect_args = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Inconsistent handling of empty strings.

The parsing logic skips empty strings (line 36 checks if self.database_connect_args), leaving them as empty strings rather than converting them to None or {}. This is inconsistent with the intent to provide a dict for connection args and could cause issues downstream.

Apply this diff to handle empty strings consistently:

 # Parse database_connect_args if provided as JSON string
-if self.database_connect_args and isinstance(self.database_connect_args, str):
+if self.database_connect_args is not None and isinstance(self.database_connect_args, str):
+    if not self.database_connect_args.strip():
+        self.database_connect_args = None
+    else:
-    try:
-        parsed_args = json.loads(self.database_connect_args)
-        if isinstance(parsed_args, dict):
-            self.database_connect_args = parsed_args
-        else:
-            self.database_connect_args = {}
-    except json.JSONDecodeError:
-        self.database_connect_args = {}
+        try:
+            parsed_args = json.loads(self.database_connect_args)
+            if isinstance(parsed_args, dict):
+                self.database_connect_args = parsed_args
+            else:
+                self.database_connect_args = {}
+        except json.JSONDecodeError:
+            self.database_connect_args = {}
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/relational/config.py around lines 36 to 44,
the code skips empty strings because it checks `if self.database_connect_args`
before parsing, leaving "" unchanged; update the logic to first check if
`isinstance(self.database_connect_args, str)`, then strip the string and if it
is empty set `self.database_connect_args = {}`; otherwise attempt `json.loads`
and on success assign the dict (or `{}` if parsed value is not a dict), catching
`json.JSONDecodeError` and setting `{}` on error.

db_username: str,
db_password: str,
db_provider: str,
database_connect_args: dict = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Use Optional[dict] for clarity and correctness.

The type annotation dict = None is technically valid but less explicit than Optional[dict] = None or dict | None = None. Per PEP 484 conventions, using Optional or the union syntax is preferred for optional parameters.

Apply this diff:

+from typing import Optional
+
 @lru_cache
 def create_relational_engine(
     db_path: str,
     db_name: str,
     db_host: str,
     db_port: str,
     db_username: str,
     db_password: str,
     db_provider: str,
-    database_connect_args: dict = None,
+    database_connect_args: Optional[dict] = None,
 ):
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/relational/create_relational_engine.py around
line 14, the parameter annotation currently uses "dict = None"; change it to an
explicit optional type such as "Optional[dict] = None" (or "dict | None = None"
for Python 3.10+) and add the corresponding import from typing (from typing
import Optional) if not already present, to make the optional nature explicit
and follow PEP 484 conventions.

"""

def __init__(self, connection_string: str):
def __init__(self, connection_string: str, connect_args: dict = None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Use Optional[dict] for type annotation consistency.

Similar to create_relational_engine.py, the type annotation dict = None should use Optional[dict] = None for clarity and PEP 484 compliance.

Apply this diff:

+from typing import Optional
+
-    def __init__(self, connection_string: str, connect_args: dict = None):
+    def __init__(self, connection_string: str, connect_args: Optional[dict] = None):

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
around line 32, the __init__ parameter connect_args is annotated as dict = None;
change the annotation to Optional[dict] = None for PEP 484 consistency and match
create_relational_engine.py. Add the necessary import from typing (Optional) at
top of the file if missing, then update the signature to use Optional[dict] =
None.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants