Replies: 8 comments
-
|
Hi Aleksis, one of the co-authors of the paper here. Thanks for taking an interest in our work! Really cool to see someone taking a deep, hands-on dive into the paper. |
Beta Was this translation helpful? Give feedback.
-
|
Since you showed interest (and I love talking about it 🤓 ), here’s a bit of background: the experiments in the paper used an internal tool called Dreamify. As mentioned in the paper, Dreamify treated Cognee and its evaluation framework as a black box and ran hyperparameter optimization using TPE. The setup had a custom wrapper around an older version of the evaluation scripts you’re referring to. The scripts that are now in the evals folder come from some later work. We made those public mainly to support repeated runs and to analyze distributions of evaluation results. We haven’t published the newer evaluation setup yet. So you’re right: to reproduce the exact results from the paper, you’d need:
The first two were never released publicly and remain proprietary, though they influenced a lot of what we’ve built since then. |
Beta Was this translation helpful? Give feedback.
-
|
Can you clarify what exactly you’re trying to reproduce? |
Beta Was this translation helpful? Give feedback.
-
|
Okay, that was a lot of info for you 😅 . Hope it helps. Let me know what you think about it, and please keep us updated on your Cognee eval research! |
Beta Was this translation helpful? Give feedback.
-
|
Yes, thanks a lot. I'm trying to add Ontotext GraphDB as an option for a graph database. I want to reproduce the benchmark results with the standard implementation. After that, I want to reproduce them with Ontotext GraphDB to see that my implementation matches the result. And finally, I want to try to add some of the features of Ontotext GraphDB like ontology support, reasoning and sparql to see if I can even improve the results |
Beta Was this translation helpful? Give feedback.
-
|
Could you maybe proved me the hyperparameters with which you achieved the best result also in the paper it's stated that you did 50 trials for each experiment is the final result the best or the average |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for the clarification! Good luck with the Ontotext GraphDB adapter development. We know how interesting (and challenging!) developing those can be. Please let us know how it performs once you’ve got it running, if you can! We have a community repo where you can share it if you’d like. And if you ever need deeper help building it, feel free to reach out so we can discuss what a funded collaboration might look like. |
Beta Was this translation helpful? Give feedback.
-
|
Keep an eye on our updates , we’ll be sharing some of what’s currently proprietary in the future. For now, we can’t release additional details, but we’ll make sure the community hears about it once we can! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
=Hi all, I'm trying to replicate the results in the Cognee paper (https://arxiv.org/pdf/2505.24478) on HotPotQA there is an evaluation script in the repo (https://github.com/topoteretes/cognee/tree/main/evals), but when I run it, I get significantly worse results. I'm guessing I need to apply the proper hyperparameters, but they are not stated anywhere. Does someone know if there is an issue with the evaluation script in the repo, or where are the hyperparameters? Thanks for the help.
This discussion was automatically pulled from Discord.
Beta Was this translation helpful? Give feedback.
All reactions