- Added support for multi-vector collection in Qdrant driver.
- Improved log output readability in Retrievers and GraphRAG and added embedded vector to retriever result metadata for debugging.
- Added the
run_with_contextmethod toComponent. This method includes acontext_parameter, which provides information about the pipeline from which the component is executed (e.g., therun_id). It also enables the component to send events to the pipeline's callback function.
- Added
enforce_schemaparameter toSimpleKGPipelinefor optional schema enforcement.
- Added optional schema enforcement as a validation layer after entity and relation extraction.
- Introduced a linear hybrid search ranker for HybridRetriever and HybridCypherRetriever, allowing customizable ranking with an
alphaparameter. - Introduced SearchQueryParseError for handling invalid Lucene query strings in HybridRetriever and HybridCypherRetriever.
- Fixed config loading after module reload (usage in jupyter notebooks)
- Qdrant retriever now fallbacks on the point ID if the
external_id_propertyis not found in the payload. - Updated a few dependencies, mainly
pypdf,anthropicandcohere.
- Utility functions to retrieve metadata for vector and full-text indexes.
- Support for effective_search_ratio parameter in vector and hybrid searches.
- Introduced upsert_vectors utility function for batch upserting embeddings to vector indexes.
- Introduced
extract_cypherfunction to enhance Cypher query extraction and formatting inText2CypherRetriever. - Introduced Neo4jMessageHistory and InMemoryMessageHistory classes for managing LLM message histories.
- Added examples and documentation for using message history with Neo4j and in-memory storage.
- Updated LLM and GraphRAG classes to support new message history classes.
- Refactored index-related functions for improved compatibility and functionality.
- Added deprecation warnings to upsert_vector, upsert_vector_on_relationship functions in favor of upsert_vectors.
- Added deprecation warnings to async_upsert_vector, async_upsert_vector_on_relationship functions notifying developers that they will be removed in a future release.
- Added support for database, timeout, and sanitize arguments in schema functions.
- Resolved an issue with an incorrectly hard coded node alias in the
_handle_field_filterfunction.
- Ability to add event listener to get notifications about Pipeline progress.
- Added py.typed so that mypy knows to use type annotations from the neo4j-graphrag package.
- Support for creating enhanced schemas with detailed property statistics.
- New utility functions for schema formatting and value sanitization.
- Updated unit and integration tests to cover enhanced schema functionality.
- Changed the default behaviour of
FixedSizeSplitterto avoid words cut-off in the chunks whenever it is possible. - Refactored schema creation code to reduce duplication and improve maintainability.
- Removed the
uuidpackage from dependencies (not needed with Python 3). - Fixed a bug in the
AnthropicLLMclass preventing it from being used inGraphRAGpipeline.
- Fix a bug where the
OllamaEmbedderwould return alist[list[float]]instead of the expectedlist[float].
- PyYAML dependency was missing and has been added.
- Weaviate was unintentionally added as a mandatory dependency in previous version, this behavior has been reverted.
- PyPDF and fsspec are not optional anymore so that SimpleKGPipeline examples can run out of the box (they just require the independent installation of openai python package if using OpenAI).
- Support for conversations with message history, including a new
message_historyparameter for LLM interactions. - Ability to include system instructions in LLM invoke method.
- Summarization of chat history to enhance query embedding and context handling in GraphRAG.
- Updated LLM implementations to handle message history consistently across providers.
- The
id_prefixparameter in theLexicalGraphConfigis deprecated.
- IDs for the Document and Chunk nodes in the lexical graph are now randomly generated and unique across multiple runs, fixing issues in the lexical graph where relationships were created between chunks that were created by different pipeline runs.
- Improve logging for a better debugging experience: long lists and strings are now truncated. The max length can be controlled using the
LOGGING__MAX_LIST_LENGTHandLOGGING__MAX_STRING_LENGTHenv variables.
- Integrated
json-repairpackage to handle and repair invalid JSON generated by LLMs. - Introduced
InvalidJSONErrorexception for handling cases where JSON repair fails. - Ability to create a Pipeline or SimpleKGPipeline from a config file. See the example.
- Added
OllamaLLMandOllamaEmbeddingsclasses to make Ollama support more explicit. Implementations using theOpenAILLMandOpenAIEmbeddingsclasses will still work.
- Updated LLM prompt for Entity and Relation extraction to include stricter instructions for generating valid JSON.
- Added schema functions to the documentation.
- Introduced optional lexical graph configuration for
SimpleKGPipeline, enhancing flexibility in customizing node labels and relationship types in the lexical graph. - Introduced optional
neo4j_databaseparameter forSimpleKGPipeline,Neo4jChunkReaderandText2CypherRetriever. - Ability to provide description and list of properties for entities and relations in the
SimpleKGPipelineconstructor.
neo4j_databaseparameter is now used for all queries in theNeo4jWriter.
- Updated all examples to use
neo4j_databaseparameter instead of an undocumented neo4j driver constructor. - All
READqueries are now routed to a reader replica (for clusters). This impacts all retrievers, theNeo4jChunkReaderandSinglePropertyExactMatchResolvercomponents.
- Made
relationsandpotential_schemaoptional inSchemaBuilder. - Added a check to prevent the use of deprecated Cypher syntax for Neo4j versions 5.23.0 and above.
- Added a
LexicalGraphBuildercomponent to enable the import of the lexical graph (document, chunks) without performing entity and relation extraction. - Added a
Neo4jChunkReadercomponent to be able to read chunk text from the database.
- Vector and Hybrid retrievers used with
return_propertiesnow also return the node labels (nodeLabels) and the node's element ID (id). HybridRetrievernow filters out the embedding property index inself.vector_index_namefrom the retriever result by default.- Removed support for neo4j.AsyncDriver in the KG creation pipeline, affecting Neo4jWriter and related components.
- Updated examples and unit tests to reflect the removal of async driver support.
- Resolved issue with
AzureOpenAIEmbeddingsincorrectly inheriting fromOpenAIEmbeddings, now inherits fromBaseOpenAIEmbeddings.
- Introduced a
fail_if_existoption to index creation functions to control behavior when an index already exists. - Added Qdrant retriever in neo4j_graphrag.retrievers.
- Comprehensive rewrite of the README to improve clarity and provide detailed usage examples.
- Fix a bug where
openaiPython client andnumpywere required to import any embedder or LLM.
- The value associated to the enum field
OnError.IGNOREhas been changed from "CONTINUE" to "IGNORE" to stick to the convention and match the field name.
- Added
SinglePropertyExactMatchResolvercomponent allowing to merge entities with exact same property (e.g. name) - Added the
SimpleKGPipelineclass, a simplified abstraction layer to streamline knowledge graph building processes from text documents.
- Added
SinglePropertyExactMatchResolvercomponent allowing to merge entities with exact same property (e.g. name)
- Added AzureOpenAILLM and AzureOpenAIEmbeddings to support Azure served OpenAI models
- Added
templatevalidation inPromptTemplateclass upon construction. - Examples demonstrating the use of Mistral embeddings and LLM in RAG pipelines.
- Added feature to include kwargs in
Text2CypherRetriever.search()that will be injected into a custom prompt, if provided. - Added validation to
custom_promptparameter ofText2CypherRetrieverto ensure thatquery_textplaceholder exists in prompt. - Introduced a fixed size text splitter component for splitting text into specified fixed size chunks with overlap. Updated examples and tests to utilize this new component.
- Introduced Vertex AI LLM class for integrating Vertex AI models.
- Added unit tests for the Vertex AI LLM class.
- Added support for Cohere LLM and embeddings - added optional dependency to
cohere. - Added support for Anthropic LLM - added optional dependency to
anthropic. - Added support for MistralAI LLM - added optional dependency to
mistralai. - Added support for Qdrant - added optional dependency to
qdrant-client.
- Resolved import issue with the Vertex AI Embeddings class.
- Fixed bug in
Text2CypherRetrieverusingcustom_promptarg where thesearchmethod would not inject thequery_textcontent. custom_promptarg is now converted toText2CypherTemplateclass within theText2CypherRetriever.get_search_resultsmethod.Text2CypherTemplateandRAGTemplateprompt templates now requirequery_textarg and will error if it is not present. Previousquery_textaliases may be used, but will warn of deprecation.- Resolved issue where Neo4jWriter component would raise an error if the start or end node ID was not defined properly in the input.
- Resolved issue where relationship types was not escaped in the insert Cypher query.
- Improved query performance in Neo4jWriter: created nodes now have a generic
__KGBuilder__label and an index is created on the__KGBuilder__.idproperty. Moreover, insertion queries are now batched. Batch size can be controlled using thebatch_sizeparameter in theNeo4jWritercomponent.
- Moved the Embedder class to the neo4j_graphrag.embeddings directory for better organization alongside other custom embedders.
- Removed query argument from the GraphRAG class'
.searchmethod; users must now usequery_text. - Neo4jWriter component now runs a single query to merge node and set its embeddings if any.
- Nodes created by the
Neo4jWriternow have an extra__KGBuilder__label. Nodes from the entity graph also have an__Entity__label. - Dropped support for Python 3.8 (end of life).
- Updated documentation links in README.
- Renamed deprecated package references in documentation.
- Introduction page to the documentation content tree.
- Introduced a new Vertex AI embeddings class for generating text embeddings using Vertex AI.
- Updated documentation to include OpenAI and Vertex AI embeddings classes.
- Added google-cloud-aiplatform as an optional dependency for Vertex AI embeddings.
- Make
pygraphvizan optional dependency - it is now only required when callingpipeline.draw.
- Moved pygraphviz to optional dependencies under [tool.poetry.extras] in pyproject.toml to resolve an issue where pip install neo4j-graphrag incorrectly required pygraphviz as a mandatory dependency.
- Officially renamed neo4j-genai to neo4j-graphrag. For the final release version of neo4j-genai, please visit https://pypi.org/project/neo4j-genai/.
- The
neo4j-genaipackage is now deprecated. Users are advised to switch to the new packageneo4j-graphrag.
- Ability to visualise pipeline with
my_pipeline.draw("pipeline.png"). LexicalGraphBuildercomponent to create the lexical graph without entity-relation extraction.
- Pipelines now return correct results when the same pipeline is run in parallel.
- Pipeline run method now return a PipelineResult object.
- Improved parameter validation for pipelines (#124). Pipeline now raise an error before a run starts if:
- the same parameter is mapped twice
- or a parameter is defined in the mapping but is not a valid component input
- PDF-to-graph pipeline for knowledge graph construction in experimental mode
- Introduced support for Component/Pipeline flexible architecture.
- Added new components for knowledge graph construction, including text splitters, schema builders, entity-relation extractors, and Neo4j writers.
- Implemented end-to-end tests for the new knowledge graph builder pipeline.
- When saving the lexical graph in a KG creation pipeline, the document is also saved as a specific node, together with relationships between each chunk and the document they were created from.
- Corrected the hybrid retriever query to ensure proper normalization of scores in vector search results.
- Add optional custom_prompt arg to the Text2CypherRetriever class.
GraphRAG.searchmethod first parameter has been renamedquery_text(wasquery) for consistency with the retrievers interface.- Made
GraphRAG.searchmethod backwards compatible with the query parameter, raising warnings to encourage using query_text instead.
- Corrected initialization to allow specifying the embedding model name.
- Removed sentence_transformers from embeddings/init.py to avoid ImportError when the package is not installed.
- Stopped embeddings from being returned when searching with
VectorRetriever. AddednodeLabelsandidto the metadata ofVectorRetrieverresults. - Added
upsert_vectorutility function for attaching vectors to node properties. - Introduced
Neo4jInsertionErrorfor handling insertion failures in Neo4j. - Included Pinecone and Weaviate retrievers in neo4j_graphrag.retrievers.
- Introduced the GraphRAG object, enabling a full RAG (Retrieval-Augmented Generation) pipeline with context retrieval, prompt formatting, and answer generation.
- Added PromptTemplate and RagTemplate for customizable prompt generation.
- Added LLMInterface with implementation for OpenAI LLM.
- Updated project configuration to support multiple Python versions (3.8 to 3.12) in CI workflows.
- Improved developer experience by copying the docstring from the
Retriever.get_search_resultsmethod to theRetriever.searchmethod - Support for specifying database names in index handling methods and retrievers.
- User Guide in documentation.
- Introduced result_formatter argument to all retrievers, allowing custom formatting of retriever results.
- Refactored import paths for retrievers to neo4j_graphrag.retrievers.
- Implemented exception chaining for all re-raised exceptions to improve stack trace readability.
- Made error messages in
index.pymore consistent. - Renamed
Retriever._get_search_resultstoRetriever.get_search_results - Updated retrievers and index handling methods to accept optional database names.
- Removed Pinecone and Weaviate retrievers from init.py to prevent ImportError when optional dependencies are not installed.
- Moved few-shot examples in
Text2CypherRetrieverto the constructor for better initialization and usage. Updated unit tests and example script accordingly. - Fixed regex warnings in E2E tests for Weaviate and Pinecone retrievers.
- Corrected HuggingFaceEmbeddings import in E2E tests.
- Introduced custom exceptions for improved error handling, including
RetrieverInitializationError,SearchValidationError,FilterValidationError,EmbeddingRequiredError,RecordCreationError,Neo4jIndexError, andNeo4jVersionError. - Retrievers that integrates with a Weaviate vector database:
WeaviateNeo4jRetriever. - New return types that help with getting retriever results:
RetrieverResultandRetrieverResultItem. - Supported wrapper embedder object for sentence-transformers embeddings:
SentenceTransformerEmbeddings. Text2CypherRetrieverobject which allows for the retrieval of records from a Neo4j database using natural language.
- Replaced
ValueErrorwith custom exceptions across various modules for clearer and more specific error messages.
- Updated documentation to include new custom exceptions.
- Improved the use of Pydantic for input data validation for retriever objects.