-
Notifications
You must be signed in to change notification settings - Fork 914
feat(database): add connect_args support to SqlAlchemyAdapter #1861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
feat(database): add connect_args support to SqlAlchemyAdapter #1861
Conversation
- Add optional connect_args parameter to __init__ method - Support DATABASE_CONNECT_ARGS environment variable for JSON-based configuration - Enable custom connection parameters for all database engines (SQLite and PostgreSQL) - Maintain backward compatibility with existing code - Add proper error handling and validation for environment variable parsing Signed-off-by: ketanjain7981 <[email protected]>
Please make sure all the checkboxes are checked:
|
WalkthroughIntroduces optional DATABASE_CONNECT_ARGS support for configurable database connection parameters. Adds environment variable to template, extends RelationalConfig with JSON parsing capability, updates SQLAlchemyAdapter to accept and use connect_args, propagates parameter through create_relational_engine, and includes comprehensive unit tests. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @ketanjain3, thank you for submitting a PR! We will respond as soon as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (1)
68-73: User-providedtimeoutis always overridden.The current spread order
{**(connect_args or {}), **{"timeout": 30}}means the hardcodedtimeout=30always overwrites any user-specified timeout. This contradicts the.env.templateexample showingDATABASE_CONNECT_ARGS='{"timeout": 60}'.Reverse the merge order so user-provided values take precedence over the default:
if "sqlite" in connection_string: self.engine = create_async_engine( connection_string, poolclass=NullPool, - connect_args={**(connect_args or {}), **{"timeout": 30}}, + connect_args={"timeout": 30, **(connect_args or {})}, )
🧹 Nitpick comments (2)
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (2)
37-50: Consider security implications of loggingenv_connect_args.Line 47 logs the parsed
env_connect_argsvalue, which could potentially contain sensitive connection parameters (e.g., SSL certificates, passwords in some edge cases). Consider logging a generic message without the actual values.Additionally, the current merge order (
connect_args.update(env_connect_args)) means environment variables override constructor arguments. Verify this is the intended precedence—typically, explicit constructor arguments take precedence over environment defaults.if isinstance(env_connect_args, dict): if connect_args is None: connect_args = {} - connect_args.update(env_connect_args) + # Merge env args first, then constructor args override + merged = {**env_connect_args, **(connect_args or {})} + connect_args = merged else: logger.warning( - f"DATABASE_CONNECT_ARGS is not a valid JSON dictionary: {env_connect_args}" + "DATABASE_CONNECT_ARGS is not a valid JSON dictionary" )
33-35: Add docstring for__init__method.Per coding guidelines, the new
connect_argsparameter should be documented. Consider adding a docstring:def __init__(self, connection_string: str, connect_args: Optional[dict] = None): + """ + Initialize the SQLAlchemy adapter with the given connection string. + + Parameters: + ----------- + - connection_string (str): The database connection URI. + - connect_args (Optional[dict]): Additional connection arguments passed + to the SQLAlchemy engine. Can also be set via DATABASE_CONNECT_ARGS + environment variable (JSON format). Environment values are merged + with constructor values. + """ self.db_path: str = None self.db_uri: str = connection_string
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
.env.template(1 hunks)cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py(4 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Use 4-space indentation in Python code
Use snake_case for Python module and function names
Use PascalCase for Python class names
Use ruff format before committing Python code
Use ruff check for import hygiene and style enforcement with line-length 100 configured in pyproject.toml
Prefer explicit, structured error handling in Python code
Files:
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
⚙️ CodeRabbit configuration file
**/*.py: When reviewing Python code for this project:
- Prioritize portability over clarity, especially when dealing with cross-Python compatibility. However, with the priority in mind, do still consider improvements to clarity when relevant.
- As a general guideline, consider the code style advocated in the PEP 8 standard (excluding the use of spaces for indentation) and evaluate suggested changes for code style compliance.
- As a style convention, consider the code style advocated in CEP-8 and evaluate suggested changes for code style compliance.
- As a general guideline, try to provide any relevant, official, and supporting documentation links to any tool's suggestions in review comments. This guideline is important for posterity.
- As a general rule, undocumented function definitions and class definitions in the project's Python code are assumed incomplete. Please consider suggesting a short summary of the code for any of these incomplete definitions as docstrings when reviewing.
Files:
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
cognee/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use shared logging utilities from cognee.shared.logging_utils in Python code
Files:
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
cognee/{modules,infrastructure,tasks}/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Co-locate feature-specific helpers under their respective package (modules/, infrastructure/, or tasks/)
Files:
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
🧬 Code graph analysis (1)
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (1)
cognee/shared/logging_utils.py (1)
warning(206-206)
🔇 Additional comments (4)
.env.template (1)
94-102: Well-documented environment variable addition.The documentation clearly explains the purpose and provides practical examples for both PostgreSQL (SSL configuration) and SQLite (timeout). The placement within the relational database settings section is appropriate.
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (3)
6-6: LGTM!Standard library import for JSON parsing is correctly added.
33-33: Backward-compatible signature change.The optional parameter with
Nonedefault maintains compatibility with existing code.
74-83: LGTM!The non-SQLite engine creation correctly passes through the merged
connect_argswith an empty dict fallback.
- Add comprehensive docstring for __init__ method to meet 80% coverage requirement - Fix security issue: remove sensitive data from log messages - Fix merge precedence: programmatic args now correctly override env vars - Fix SQLite timeout order: user-specified timeout now overrides default 30s - Clarify precedence in docstring documentation Signed-off-by: ketanjain7981 <[email protected]>
7d6ae6a to
c892265
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (2)
33-51: Excellent docstring and signature design.The constructor signature maintains backward compatibility, and the docstring is comprehensive, clearly documenting parameters, environment variables, and examples. This follows the coding guidelines for documenting function definitions.
Consider adding runtime validation for the programmatic
connect_argsparameter to ensure it's a dictionary if provided:self.db_path: str = None self.db_uri: str = connection_string + if connect_args is not None and not isinstance(connect_args, dict): + raise TypeError("connect_args must be a dictionary") + env_connect_args = os.getenv("DATABASE_CONNECT_ARGS")This would provide clearer error messages if incorrect types are passed, though SQLAlchemy would also catch this during engine creation.
55-66: Good environment variable parsing with appropriate error handling.The logic correctly:
- Parses the
DATABASE_CONNECT_ARGSenvironment variable as JSON- Validates the parsed value is a dictionary
- Merges environment-based args with programmatic args (programmatic takes precedence)
- Uses non-fatal warnings to maintain backward compatibility
For improved debugging, consider making the warning messages more informative:
if isinstance(parsed_env_args, dict): # Merge: env vars as base, programmatic args override merged_args = {**parsed_env_args, **(connect_args or {})} connect_args = merged_args else: - logger.warning("DATABASE_CONNECT_ARGS is not a valid JSON dictionary, ignoring") + logger.warning( + f"DATABASE_CONNECT_ARGS is not a valid JSON dictionary (got {type(parsed_env_args).__name__}), ignoring" + ) except json.JSONDecodeError: - logger.warning("Failed to parse DATABASE_CONNECT_ARGS as JSON, ignoring") + logger.warning(f"Failed to parse DATABASE_CONNECT_ARGS as JSON: {str(e)}, ignoring")
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py(4 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Use 4-space indentation in Python code
Use snake_case for Python module and function names
Use PascalCase for Python class names
Use ruff format before committing Python code
Use ruff check for import hygiene and style enforcement with line-length 100 configured in pyproject.toml
Prefer explicit, structured error handling in Python code
Files:
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
⚙️ CodeRabbit configuration file
**/*.py: When reviewing Python code for this project:
- Prioritize portability over clarity, especially when dealing with cross-Python compatibility. However, with the priority in mind, do still consider improvements to clarity when relevant.
- As a general guideline, consider the code style advocated in the PEP 8 standard (excluding the use of spaces for indentation) and evaluate suggested changes for code style compliance.
- As a style convention, consider the code style advocated in CEP-8 and evaluate suggested changes for code style compliance.
- As a general guideline, try to provide any relevant, official, and supporting documentation links to any tool's suggestions in review comments. This guideline is important for posterity.
- As a general rule, undocumented function definitions and class definitions in the project's Python code are assumed incomplete. Please consider suggesting a short summary of the code for any of these incomplete definitions as docstrings when reviewing.
Files:
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
cognee/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use shared logging utilities from cognee.shared.logging_utils in Python code
Files:
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
cognee/{modules,infrastructure,tasks}/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Co-locate feature-specific helpers under their respective package (modules/, infrastructure/, or tasks/)
Files:
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
🧬 Code graph analysis (1)
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (1)
cognee/shared/logging_utils.py (1)
warning(206-206)
🔇 Additional comments (3)
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (3)
6-6: LGTM! Safe JSON import.The
jsonimport is necessary for parsing theDATABASE_CONNECT_ARGSenvironment variable and is correctly placed with other standard library imports.
84-89: LGTM! Correct SQLite connect_args handling.The merge strategy
{**{"timeout": 30}, **(connect_args or {})}correctly:
- Provides a sensible default timeout of 30 seconds for SQLite
- Allows user-provided
connect_argsto override the default- Handles None with the
or {}fallback
91-99: LGTM! Correct non-SQLite connect_args handling.The code correctly passes
connect_args or {}tocreate_async_engine, ensuring an empty dictionary is provided whenconnect_argsis None.
- Remove unused connect_args parameter from __init__ - Programmatic parameter was dead code (never called by users) - Users call get_relational_engine() which doesn't expose connect_args - Keep DATABASE_CONNECT_ARGS env var support (actually used in production) - Simplify implementation and reduce complexity - Update docstring to reflect env-var-only approach - Add production examples to docstring Signed-off-by: ketanjain7981 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (2)
33-57: Constructor docstring/env docs look good, but double‑check alignment with PR intent.The expanded
__init__docstring clearly explainsDATABASE_CONNECT_ARGS, examples, and intended usage, and it matches the actual behavior of this class (env‑driven configuration only).However, the PR description mentions adding an optional
connect_argsparameter toSQLAlchemyAdapter.__init__, but the current signature still only acceptsconnection_string: str. If you still intend to support programmaticconnect_args(in addition to the env var), consider either:
- Updating the signature (e.g.,
def __init__(self, connection_string: str, connect_args: dict | None = None)) and documenting the precedence between env vs code, or- Adjusting the PR description to reflect that only env‑based configuration is supported.
61-73: Env‑drivenDATABASE_CONNECT_ARGSparsing is robust; consider logging a bit more detail.The JSON parsing and type check (must be a
dict) are good defensive choices: invalid JSON or non‑dict values are safely ignored with warnings instead of breaking startup.If you want slightly better diagnosability, you could:
- Include the exception message in the JSON parse failure warning, and/or
- Use
exc_info=Trueso operators can see stack traces when needed.For example:
- except json.JSONDecodeError: - logger.warning("Failed to parse DATABASE_CONNECT_ARGS as JSON, ignoring") + except json.JSONDecodeError as exc: + logger.warning( + "Failed to parse DATABASE_CONNECT_ARGS as JSON, ignoring: %s", exc + )This keeps behavior identical while making misconfiguration easier to debug.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py(4 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Use 4-space indentation in Python code
Use snake_case for Python module and function names
Use PascalCase for Python class names
Use ruff format before committing Python code
Use ruff check for import hygiene and style enforcement with line-length 100 configured in pyproject.toml
Prefer explicit, structured error handling in Python code
Files:
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
⚙️ CodeRabbit configuration file
**/*.py: When reviewing Python code for this project:
- Prioritize portability over clarity, especially when dealing with cross-Python compatibility. However, with the priority in mind, do still consider improvements to clarity when relevant.
- As a general guideline, consider the code style advocated in the PEP 8 standard (excluding the use of spaces for indentation) and evaluate suggested changes for code style compliance.
- As a style convention, consider the code style advocated in CEP-8 and evaluate suggested changes for code style compliance.
- As a general guideline, try to provide any relevant, official, and supporting documentation links to any tool's suggestions in review comments. This guideline is important for posterity.
- As a general rule, undocumented function definitions and class definitions in the project's Python code are assumed incomplete. Please consider suggesting a short summary of the code for any of these incomplete definitions as docstrings when reviewing.
Files:
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
cognee/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use shared logging utilities from cognee.shared.logging_utils in Python code
Files:
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
cognee/{modules,infrastructure,tasks}/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Co-locate feature-specific helpers under their respective package (modules/, infrastructure/, or tasks/)
Files:
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
🧬 Code graph analysis (1)
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (1)
cognee/shared/logging_utils.py (1)
warning(206-206)
🔇 Additional comments (2)
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (2)
6-6:jsonimport is appropriate and scoped.The added
jsonimport cleanly supports the newDATABASE_CONNECT_ARGSparsing with no unused-import concerns in this file.
90-105:connect_argsplumbing intocreate_async_enginelooks correct; clarify precedence if you later add a parameter.For SQLite, the merge
connect_args={**{"timeout": 30}, **(connect_args or {})}correctly ensures:
- A sane default
timeout=30, and- Any value from
DATABASE_CONNECT_ARGSoverrides that default on conflict.For non‑SQLite, passing
connect_args=connect_args or {}is also consistent with SQLAlchemy’screate_async_engineexpectations.If you later introduce a programmatic
connect_argsparameter on__init__, it would be worth explicitly defining precedence (e.g., “code wins over env”) and adjusting the merge order accordingly, e.g.:effective_connect_args = {**(env_connect_args or {}), **(init_connect_args or {})}So callers are never surprised by env vars silently overriding explicit arguments.
Signed-off-by: ketanjain7981 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
cognee/tests/unit/infrastructure/databases/relational/sqlalchemy/test_SqlAlchemyAdapter.py (3)
11-12: Makeos.getenvmocking more targeted and future‑proofEach test globally patches
os.getenvwith a singlereturn_value, so every env lookup insideSqlAlchemyAdapter.__init__sees the same value. If the adapter ever reads additional env vars (e.g., for S3, timeouts, or other settings), these tests could become brittle or mask bugs.Consider instead:
- Using
patch.dict("os.environ", {"DATABASE_CONNECT_ARGS": "<json>"}, clear=False)to only affect the variable under test, or- Giving
os.getenvaside_effectthat returns the JSON only whenkey == "DATABASE_CONNECT_ARGS"and otherwise delegates to the realos.getenv.Example sketch for one test:
from os import getenv as real_getenv @patch( "cognee.infrastructure.databases.relational.sqlalchemy.SqlAlchemyAdapter.create_async_engine" ) @patch("os.getenv") def test_sqlite_with_env_var_timeout(self, mock_getenv, mock_create_engine): mock_getenv.side_effect = ( lambda key, default=None: '{"timeout": 60}' if key == "DATABASE_CONNECT_ARGS" else real_getenv(key, default) ) ...This keeps the setup tightly focused on
DATABASE_CONNECT_ARGSand resilient to new env‑driven behavior in the adapter. As per coding guidelines, this improves clarity and portability of env‑dependent tests.Also applies to: 24-25, 37-38, 50-51, 63-64, 76-77
38-46: Optionally extend tests for non‑dict JSON and explicitconnect_argsprecedenceThe adapter’s description mentions:
- Ignoring
DATABASE_CONNECT_ARGSwhen the parsed JSON is not a dict, and- Merging env connect_args with explicit
connect_argspassed toSQLAlchemyAdapter, with programmatic values taking precedence.You already cover invalid JSON and basic merging for SQLite and env‑only behavior for PostgreSQL. Adding small tests for:
- A valid but non‑dict JSON (e.g.,
"[]"or"\"foo\""), and- A call that passes
connect_args={...}whileDATABASE_CONNECT_ARGSis set,would lock in those extra branches and the precedence guarantee.
Also applies to: 51-58, 64-84
7-7: Add a short docstring forTestSqlAlchemyAdapterProject guidelines treat undocumented classes as incomplete. A one‑line class‑level docstring (e.g., summarizing that it verifies env‑driven
connect_argshandling forSQLAlchemyAdapter) would bring this test module fully in line with the stated conventions. As per coding guidelines, adding concise docstrings is preferred.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
cognee/tests/unit/infrastructure/databases/relational/sqlalchemy/test_SqlAlchemyAdapter.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Use 4-space indentation in Python code
Use snake_case for Python module and function names
Use PascalCase for Python class names
Use ruff format before committing Python code
Use ruff check for import hygiene and style enforcement with line-length 100 configured in pyproject.toml
Prefer explicit, structured error handling in Python code
Files:
cognee/tests/unit/infrastructure/databases/relational/sqlalchemy/test_SqlAlchemyAdapter.py
⚙️ CodeRabbit configuration file
**/*.py: When reviewing Python code for this project:
- Prioritize portability over clarity, especially when dealing with cross-Python compatibility. However, with the priority in mind, do still consider improvements to clarity when relevant.
- As a general guideline, consider the code style advocated in the PEP 8 standard (excluding the use of spaces for indentation) and evaluate suggested changes for code style compliance.
- As a style convention, consider the code style advocated in CEP-8 and evaluate suggested changes for code style compliance.
- As a general guideline, try to provide any relevant, official, and supporting documentation links to any tool's suggestions in review comments. This guideline is important for posterity.
- As a general rule, undocumented function definitions and class definitions in the project's Python code are assumed incomplete. Please consider suggesting a short summary of the code for any of these incomplete definitions as docstrings when reviewing.
Files:
cognee/tests/unit/infrastructure/databases/relational/sqlalchemy/test_SqlAlchemyAdapter.py
cognee/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use shared logging utilities from cognee.shared.logging_utils in Python code
Files:
cognee/tests/unit/infrastructure/databases/relational/sqlalchemy/test_SqlAlchemyAdapter.py
cognee/tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
cognee/tests/**/*.py: Place Python tests under cognee/tests/ organized by type (unit, integration, cli_tests)
Name Python test files test_*.py and use pytest.mark.asyncio for async tests
Files:
cognee/tests/unit/infrastructure/databases/relational/sqlalchemy/test_SqlAlchemyAdapter.py
🔇 Additional comments (1)
cognee/tests/unit/infrastructure/databases/relational/sqlalchemy/test_SqlAlchemyAdapter.py (1)
7-84: Good coverage of env‑drivenconnect_argsbehaviorThese tests exercise the key branches for SQLite and PostgreSQL: default timeout, env‑overridden timeout, merged extra args, invalid JSON fallback, and presence/absence of env‑provided
connect_args. Structure and assertions againstcreate_async_enginekwargs look consistent with the adapter’s described behavior.
Signed-off-by: ketanjain7981 <[email protected]>
Signed-off-by: ketanjain7981 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (2)
33-33: UseOptional[dict]for parameters withNonedefault.Per PEP 484, parameters with a default value of
Noneshould useOptional[T]rather thanT = None.Apply this diff:
- def __init__(self, connection_string: str, connect_args: dict = None): + def __init__(self, connection_string: str, connect_args: Optional[dict] = None):
63-74: Enhance error logging with exception details and context.The current warnings silently ignore parsing errors without providing diagnostic information. Including the exception details and the actual environment variable value would aid troubleshooting.
Apply this diff to improve error logging:
env_connect_args = os.getenv("DATABASE_CONNECT_ARGS") if env_connect_args: try: parsed_args = json.loads(env_connect_args) if isinstance(parsed_args, dict): env_connect_args_dict = parsed_args else: - logger.warning("DATABASE_CONNECT_ARGS is not a valid JSON dictionary, ignoring") + logger.warning( + f"DATABASE_CONNECT_ARGS is not a valid JSON dictionary (got {type(parsed_args).__name__}), ignoring" + ) except json.JSONDecodeError: - logger.warning("Failed to parse DATABASE_CONNECT_ARGS as JSON, ignoring") + logger.warning( + f"Failed to parse DATABASE_CONNECT_ARGS as JSON, ignoring", + exc_info=True + )This follows the coding guideline: "Prefer explicit, structured error handling in Python code."
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py(4 hunks)cognee/tests/unit/infrastructure/databases/relational/sqlalchemy/test_SqlAlchemyAdapter.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- cognee/tests/unit/infrastructure/databases/relational/sqlalchemy/test_SqlAlchemyAdapter.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Use 4-space indentation in Python code
Use snake_case for Python module and function names
Use PascalCase for Python class names
Use ruff format before committing Python code
Use ruff check for import hygiene and style enforcement with line-length 100 configured in pyproject.toml
Prefer explicit, structured error handling in Python code
Files:
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
⚙️ CodeRabbit configuration file
**/*.py: When reviewing Python code for this project:
- Prioritize portability over clarity, especially when dealing with cross-Python compatibility. However, with the priority in mind, do still consider improvements to clarity when relevant.
- As a general guideline, consider the code style advocated in the PEP 8 standard (excluding the use of spaces for indentation) and evaluate suggested changes for code style compliance.
- As a style convention, consider the code style advocated in CEP-8 and evaluate suggested changes for code style compliance.
- As a general guideline, try to provide any relevant, official, and supporting documentation links to any tool's suggestions in review comments. This guideline is important for posterity.
- As a general rule, undocumented function definitions and class definitions in the project's Python code are assumed incomplete. Please consider suggesting a short summary of the code for any of these incomplete definitions as docstrings when reviewing.
Files:
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
cognee/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use shared logging utilities from cognee.shared.logging_utils in Python code
Files:
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
cognee/{modules,infrastructure,tasks}/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Co-locate feature-specific helpers under their respective package (modules/, infrastructure/, or tasks/)
Files:
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
🧬 Code graph analysis (1)
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (1)
cognee/shared/logging_utils.py (1)
warning(206-206)
🔇 Additional comments (4)
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (4)
6-6: LGTM!The
jsonimport is appropriately added to support parsing theDATABASE_CONNECT_ARGSenvironment variable.
34-59: LGTM!The docstring is comprehensive and well-structured, with clear parameter descriptions, environment variable documentation, and practical examples for both PostgreSQL and SQLite configurations.
77-77: LGTM!The merge logic correctly implements the documented precedence, where explicit constructor arguments override environment variable settings.
95-110: LGTM!The
connect_argsapplication is correctly implemented:
- SQLite receives a sensible default timeout of 30 seconds that can be overridden via
final_connect_args- Non-SQLite databases use
final_connect_argsdirectly- The implementation maintains backward compatibility while enabling customization
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
Outdated
Show resolved
Hide resolved
Signed-off-by: ketanjain7981 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (1)
cognee/infrastructure/databases/relational/config.py (1)
43-44: Consider logging parse failures for observability.Silent fallback to
{}onJSONDecodeErroror non-dict JSON might hide configuration issues. Users may be unaware theirDATABASE_CONNECT_ARGSis malformed.As per coding guidelines, use shared logging utilities from cognee.shared.logging_utils. Apply this diff:
+from cognee.shared.logging_utils import get_logger + +logger = get_logger() + # In fill_derived method: try: parsed_args = json.loads(self.database_connect_args) if isinstance(parsed_args, dict): self.database_connect_args = parsed_args else: + logger.warning( + f"DATABASE_CONNECT_ARGS must be a JSON object, got {type(parsed_args).__name__}. Using empty dict." + ) self.database_connect_args = {} except json.JSONDecodeError: + logger.warning( + f"Failed to parse DATABASE_CONNECT_ARGS as JSON: {self.database_connect_args}. Using empty dict." + ) self.database_connect_args = {}
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
cognee/infrastructure/databases/relational/config.py(5 hunks)cognee/infrastructure/databases/relational/create_relational_engine.py(3 hunks)cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py(3 hunks)cognee/tests/unit/infrastructure/databases/relational/test_RelationalConfig.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Use 4-space indentation in Python code
Use snake_case for Python module and function names
Use PascalCase for Python class names
Use ruff format before committing Python code
Use ruff check for import hygiene and style enforcement with line-length 100 configured in pyproject.toml
Prefer explicit, structured error handling in Python code
Files:
cognee/infrastructure/databases/relational/config.pycognee/infrastructure/databases/relational/create_relational_engine.pycognee/tests/unit/infrastructure/databases/relational/test_RelationalConfig.pycognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
⚙️ CodeRabbit configuration file
**/*.py: When reviewing Python code for this project:
- Prioritize portability over clarity, especially when dealing with cross-Python compatibility. However, with the priority in mind, do still consider improvements to clarity when relevant.
- As a general guideline, consider the code style advocated in the PEP 8 standard (excluding the use of spaces for indentation) and evaluate suggested changes for code style compliance.
- As a style convention, consider the code style advocated in CEP-8 and evaluate suggested changes for code style compliance.
- As a general guideline, try to provide any relevant, official, and supporting documentation links to any tool's suggestions in review comments. This guideline is important for posterity.
- As a general rule, undocumented function definitions and class definitions in the project's Python code are assumed incomplete. Please consider suggesting a short summary of the code for any of these incomplete definitions as docstrings when reviewing.
Files:
cognee/infrastructure/databases/relational/config.pycognee/infrastructure/databases/relational/create_relational_engine.pycognee/tests/unit/infrastructure/databases/relational/test_RelationalConfig.pycognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
cognee/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use shared logging utilities from cognee.shared.logging_utils in Python code
Files:
cognee/infrastructure/databases/relational/config.pycognee/infrastructure/databases/relational/create_relational_engine.pycognee/tests/unit/infrastructure/databases/relational/test_RelationalConfig.pycognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
cognee/{modules,infrastructure,tasks}/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Co-locate feature-specific helpers under their respective package (modules/, infrastructure/, or tasks/)
Files:
cognee/infrastructure/databases/relational/config.pycognee/infrastructure/databases/relational/create_relational_engine.pycognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
cognee/tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
cognee/tests/**/*.py: Place Python tests under cognee/tests/ organized by type (unit, integration, cli_tests)
Name Python test files test_*.py and use pytest.mark.asyncio for async tests
Files:
cognee/tests/unit/infrastructure/databases/relational/test_RelationalConfig.py
🧬 Code graph analysis (2)
cognee/infrastructure/databases/relational/create_relational_engine.py (1)
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (1)
SQLAlchemyAdapter(26-672)
cognee/tests/unit/infrastructure/databases/relational/test_RelationalConfig.py (1)
cognee/infrastructure/databases/relational/config.py (3)
RelationalConfig(11-68)to_dict(48-68)to_dict(116-133)
🔇 Additional comments (2)
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (1)
54-88: LGTM! Correct handling of connect_args with proper precedence.The implementation correctly:
- Falls back to an empty dict when
connect_argsisNone- For SQLite, merges user-provided args with default timeout, with user values taking precedence (line 77:
{**{"timeout": 30}, **final_connect_args})- For non-SQLite databases, passes the args directly to the engine
The spread operator ordering ensures user-provided connection arguments override defaults, which is the correct behavior.
cognee/tests/unit/infrastructure/databases/relational/test_RelationalConfig.py (1)
1-69: Excellent test coverage for edge cases!The test suite comprehensively covers:
- Valid JSON dict parsing
- Empty string handling
- Missing environment variable
- Invalid/malformed JSON
- Non-dict JSON (arrays)
- Inclusion in
to_dict()output- Integer and mixed-type values
This provides strong validation of the parsing logic and ensures backward compatibility.
Note: The test at lines 17-21 expects empty strings to remain as
""rather than being converted toNoneor{}. If you address the inconsistent empty string handling suggested inconfig.py(lines 36-44), you'll need to update this test accordingly.
| db_username: Union[str, None] = None # "cognee" | ||
| db_password: Union[str, None] = None # "cognee" | ||
| db_provider: str = "sqlite" | ||
| database_connect_args: Union[str, None] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix type annotation to reflect runtime types.
The field database_connect_args is declared as Union[str, None], but after the fill_derived validator runs (lines 36-44), it can also be a dict. This creates a type inconsistency.
Apply this diff to fix the type annotation:
- database_connect_args: Union[str, None] = None
+ database_connect_args: Union[dict, str, None] = None📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| database_connect_args: Union[str, None] = None | |
| database_connect_args: Union[dict, str, None] = None |
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/relational/config.py around line 23 (and
refer to the fill_derived validator at lines 36-44), the field
database_connect_args is annotated as Union[str, None] but at runtime the
validator can set it to a dict; update the type annotation to reflect
Optional[Union[str, Dict[str, Any]]] (or Optional[Union[str, dict]]) and add the
necessary typing imports (Dict and Any or use dict) so static types match
runtime values; ensure any references or mypy checks are adjusted accordingly.
| if self.database_connect_args and isinstance(self.database_connect_args, str): | ||
| try: | ||
| parsed_args = json.loads(self.database_connect_args) | ||
| if isinstance(parsed_args, dict): | ||
| self.database_connect_args = parsed_args | ||
| else: | ||
| self.database_connect_args = {} | ||
| except json.JSONDecodeError: | ||
| self.database_connect_args = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Inconsistent handling of empty strings.
The parsing logic skips empty strings (line 36 checks if self.database_connect_args), leaving them as empty strings rather than converting them to None or {}. This is inconsistent with the intent to provide a dict for connection args and could cause issues downstream.
Apply this diff to handle empty strings consistently:
# Parse database_connect_args if provided as JSON string
-if self.database_connect_args and isinstance(self.database_connect_args, str):
+if self.database_connect_args is not None and isinstance(self.database_connect_args, str):
+ if not self.database_connect_args.strip():
+ self.database_connect_args = None
+ else:
- try:
- parsed_args = json.loads(self.database_connect_args)
- if isinstance(parsed_args, dict):
- self.database_connect_args = parsed_args
- else:
- self.database_connect_args = {}
- except json.JSONDecodeError:
- self.database_connect_args = {}
+ try:
+ parsed_args = json.loads(self.database_connect_args)
+ if isinstance(parsed_args, dict):
+ self.database_connect_args = parsed_args
+ else:
+ self.database_connect_args = {}
+ except json.JSONDecodeError:
+ self.database_connect_args = {}🤖 Prompt for AI Agents
In cognee/infrastructure/databases/relational/config.py around lines 36 to 44,
the code skips empty strings because it checks `if self.database_connect_args`
before parsing, leaving "" unchanged; update the logic to first check if
`isinstance(self.database_connect_args, str)`, then strip the string and if it
is empty set `self.database_connect_args = {}`; otherwise attempt `json.loads`
and on success assign the dict (or `{}` if parsed value is not a dict), catching
`json.JSONDecodeError` and setting `{}` on error.
| db_username: str, | ||
| db_password: str, | ||
| db_provider: str, | ||
| database_connect_args: dict = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Use Optional[dict] for clarity and correctness.
The type annotation dict = None is technically valid but less explicit than Optional[dict] = None or dict | None = None. Per PEP 484 conventions, using Optional or the union syntax is preferred for optional parameters.
Apply this diff:
+from typing import Optional
+
@lru_cache
def create_relational_engine(
db_path: str,
db_name: str,
db_host: str,
db_port: str,
db_username: str,
db_password: str,
db_provider: str,
- database_connect_args: dict = None,
+ database_connect_args: Optional[dict] = None,
):🤖 Prompt for AI Agents
In cognee/infrastructure/databases/relational/create_relational_engine.py around
line 14, the parameter annotation currently uses "dict = None"; change it to an
explicit optional type such as "Optional[dict] = None" (or "dict | None = None"
for Python 3.10+) and add the corresponding import from typing (from typing
import Optional) if not already present, to make the optional nature explicit
and follow PEP 484 conventions.
| """ | ||
|
|
||
| def __init__(self, connection_string: str): | ||
| def __init__(self, connection_string: str, connect_args: dict = None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Use Optional[dict] for type annotation consistency.
Similar to create_relational_engine.py, the type annotation dict = None should use Optional[dict] = None for clarity and PEP 484 compliance.
Apply this diff:
+from typing import Optional
+
- def __init__(self, connection_string: str, connect_args: dict = None):
+ def __init__(self, connection_string: str, connect_args: Optional[dict] = None):Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
around line 32, the __init__ parameter connect_args is annotated as dict = None;
change the annotation to Optional[dict] = None for PEP 484 consistency and match
create_relational_engine.py. Add the necessary import from typing (Optional) at
top of the file if missing, then update the signature to use Optional[dict] =
None.
Description
The intent of this PR is to make the database initialization more flexible and configurable. In order to do this, the system will support a new DATABASE_CONNECT_ARGS environment variable that takes JSON-based configuration,. This enhancement will allow custom connection parameters to be passed to any supported database engine, including SQLite and PostgreSQL,. To guarantee that the environment variable is parsed securely and consistently, appropriate error handling and validation will also be added.
Type of Change
Screenshots/Videos (if applicable)
Pre-submission Checklist
DCO Affirmation
I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
Summary by CodeRabbit
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.