-
Notifications
You must be signed in to change notification settings - Fork 103
Add the possibility to explicitly allocate blank node blocks #2574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
joka921
wants to merge
13
commits into
ad-freiburg:master
Choose a base branch
from
joka921:serialize-named-cached-queries-2
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 11 commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
2a522f9
Add the possibility to explicitly allocate blank node blocks.
joka921 eb14a6b
This was quite a ride...
joka921 715d917
A thorough refactoring, that I boldly claim to be gbug free, but with…
joka921 9e98dec
Merge branch 'master' into serialize-named-cached-queries-2
joka921 46b0fed
Some changes from a review.
joka921 ad4205e
Also add Tests for the local vocab.
joka921 d6dc143
Fix the tests.
joka921 4376c26
Fix the compilation
joka921 d5cdd9e
Another round of updates...
joka921 fea560f
Another round of reviews.
joka921 66af555
Final updates from Robin'sreview (hopefully)
joka921 d25bda9
A round of further fixes and updates.
joka921 312094a
The last suggestions made by Robin!
joka921 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,39 +1,58 @@ | ||
| // Copyright 2024, University of Freiburg, | ||
| // Chair of Algorithms and Data Structures. | ||
| // Author: Moritz Dom ([email protected]) | ||
| // Copyright 2024 - 2025 The QLever Authors, in particular: | ||
| // | ||
| // 2024 Moritz Dom <[email protected]>, UFR | ||
| // 2025 Johannes Kalmbach <[email protected]>, UFR | ||
|
|
||
| // UFR = University of Freiburg, Chair of Algorithms and Data Structures | ||
|
|
||
| // You may not use this file except in compliance with the Apache 2.0 License, | ||
| // which can be found in the `LICENSE` file at the root of the QLever project. | ||
| #include "util/BlankNodeManager.h" | ||
|
|
||
| #include <absl/cleanup/cleanup.h> | ||
|
|
||
| #include "util/Exception.h" | ||
|
|
||
| namespace ad_utility { | ||
|
|
||
| // _____________________________________________________________________________ | ||
| BlankNodeManager::BlankNodeManager(uint64_t minIndex) | ||
| : minIndex_(minIndex), | ||
| randBlockIndex_( | ||
| SlowRandomIntGenerator<uint64_t>(0, totalAvailableBlocks_ - 1)) {} | ||
| state_{SlowRandomIntGenerator<uint64_t>(0, totalAvailableBlocks_ - 1)} {} | ||
|
|
||
| // _____________________________________________________________________________ | ||
| BlankNodeManager::Block BlankNodeManager::allocateBlock() { | ||
| // The Random-Generation Algorithm's performance is reduced once the number of | ||
| // used blocks exceeds a limit. | ||
| auto numBlocks = usedBlocksSet_.rlock()->size(); | ||
| auto stateLock = state_.wlock(); | ||
| auto numBlocks = stateLock->usedBlocksSet_.size(); | ||
| AD_CORRECTNESS_CHECK( | ||
| numBlocks < totalAvailableBlocks_ / 256, | ||
| absl::StrCat("Critical high number of blank node blocks in use: ", | ||
| numBlocks, " blocks")); | ||
|
|
||
| auto usedBlocksSetPtr = usedBlocksSet_.wlock(); | ||
| auto& usedBlocksSetPtr = stateLock->usedBlocksSet_; | ||
| while (true) { | ||
| auto blockIdx = randBlockIndex_(); | ||
| if (!usedBlocksSetPtr->contains(blockIdx)) { | ||
| usedBlocksSetPtr->insert(blockIdx); | ||
| auto blockIdx = stateLock->randBlockIndex_(); | ||
| if (!usedBlocksSetPtr.contains(blockIdx)) { | ||
| usedBlocksSetPtr.insert(blockIdx); | ||
| return Block(blockIdx, minIndex_ + blockIdx * blockSize_); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| // ______________________________________________________________________________ | ||
| [[nodiscard]] auto BlankNodeManager::allocateExplicitBlock(uint64_t blockIdx) | ||
| -> Block { | ||
| auto lock = state_.wlock(); | ||
| auto& usedBlocksSet = lock->usedBlocksSet_; | ||
| AD_CONTRACT_CHECK(!usedBlocksSet.contains(blockIdx), | ||
| "Trying to explicitly allocate a block of blank nodes that " | ||
| "has previously already been allocated."); | ||
| usedBlocksSet.insert(blockIdx); | ||
| return Block(blockIdx, minIndex_ + blockIdx * blockSize_); | ||
| } | ||
|
|
||
| // _____________________________________________________________________________ | ||
| BlankNodeManager::Block::Block(uint64_t blockIndex, uint64_t startIndex) | ||
| : blockIdx_(blockIndex), startIdx_(startIndex), nextIdx_(startIndex) {} | ||
|
|
@@ -45,11 +64,12 @@ BlankNodeManager::LocalBlankNodeManager::LocalBlankNodeManager( | |
|
|
||
| // _____________________________________________________________________________ | ||
| uint64_t BlankNodeManager::LocalBlankNodeManager::getId() { | ||
| if (blocks_->empty() || blocks_->back().nextIdx_ == idxAfterCurrentBlock_) { | ||
| blocks_->emplace_back(blankNodeManager_->allocateBlock()); | ||
| idxAfterCurrentBlock_ = blocks_->back().nextIdx_ + blockSize_; | ||
| auto& blocks = blocks_->blocks_; | ||
| if (blocks.empty() || blocks.back().nextIdx_ == idxAfterCurrentBlock_) { | ||
| blocks.emplace_back(blankNodeManager_->allocateBlock()); | ||
| idxAfterCurrentBlock_ = blocks.back().nextIdx_ + blockSize_; | ||
| } | ||
| return blocks_->back().nextIdx_++; | ||
| return blocks.back().nextIdx_++; | ||
| } | ||
|
|
||
| // _____________________________________________________________________________ | ||
|
|
@@ -59,12 +79,152 @@ bool BlankNodeManager::LocalBlankNodeManager::containsBlankNodeIndex( | |
| return index >= block.startIdx_ && index < block.nextIdx_; | ||
| }; | ||
|
|
||
| return ql::ranges::any_of(*blocks_, containsIndex) || | ||
| return ql::ranges::any_of(blocks_->blocks_, containsIndex) || | ||
| ql::ranges::any_of( | ||
| otherBlocks_, | ||
| [&](const std::shared_ptr<const std::vector<Block>>& blocks) { | ||
| return ql::ranges::any_of(*blocks, containsIndex); | ||
| [containsIndex](const std::shared_ptr<const Blocks>& blocks) { | ||
| return ql::ranges::any_of(blocks->blocks_, containsIndex); | ||
| }); | ||
| } | ||
|
|
||
| // _____________________________________________________________________________ | ||
| auto BlankNodeManager::LocalBlankNodeManager::getOwnedBlockIndices() const | ||
| -> std::vector<OwnedBlocksEntry> { | ||
| std::vector<OwnedBlocksEntry> indices; | ||
| // Lambda that turns a single `Blocks` object into an `OwnedBlocksEntry`. | ||
| auto resultFromSingleSet = [](const auto& set) { | ||
| OwnedBlocksEntry res; | ||
| res.uuid_ = set->uuid_; | ||
| res.blockIndices_ = ::ranges::to<std::vector>( | ||
| set->blocks_ | ql::views::transform(&Block::blockIdx_)); | ||
| return res; | ||
| }; | ||
|
|
||
| // First serialize the primary blocks set, and then the other block sets. | ||
| indices.reserve(blocks_->blocks_.size() + otherBlocks_.size()); | ||
| indices.push_back(resultFromSingleSet(blocks_)); | ||
| for (const auto& set : otherBlocks_) { | ||
| indices.push_back(resultFromSingleSet(set)); | ||
| } | ||
| return indices; | ||
| } | ||
|
|
||
| // _____________________________________________________________________________ | ||
| void BlankNodeManager::LocalBlankNodeManager::allocateBlocksFromExplicitIndices( | ||
| const std::vector<OwnedBlocksEntry>& indices) { | ||
| AD_CONTRACT_CHECK(blocks_->blocks_.empty() && otherBlocks_.empty(), | ||
| "Explicit reserving of blank node blocks is only allowed " | ||
| "for empty `LocalBlankNodeManager`s"); | ||
|
|
||
| // We read all the previously allocated blocks into the `otherBlocks_`, s.t. | ||
| // the primary `blocks_` vector statys empty. That way, we are completely | ||
| // decoupled from other `LocalBlankNodeManager`s which might reuse the same | ||
| // blocks as `*this`. | ||
| otherBlocks_.reserve(indices.size()); | ||
| for (const auto& entry : indices) { | ||
| otherBlocks_.push_back( | ||
| blankNodeManager_->registerAndAllocateBlockSet(entry)); | ||
| } | ||
| } | ||
|
|
||
| // _____________________________________________________________________________ | ||
| auto BlankNodeManager::createBlockSet() -> std::shared_ptr<Blocks> { | ||
| // Guard against the (very very unlikely) case of UUID collision | ||
| auto lockOpt = std::optional{state_.wlock()}; | ||
| auto& lock = lockOpt.value(); | ||
| auto uuid = lock->uuidGenerator_(); | ||
| auto [it, isNew] = | ||
| lock->managedBlockSets_.try_emplace(uuid, std::shared_ptr<Blocks>()); | ||
| // Note: the (very unlikely) exception thrown by the following check is safe, | ||
| // as all the destructors of the variables above are trivial, and we haven't | ||
| // actually modified the `managedBlockSets_` in the case `isNew` is false. | ||
| AD_CORRECTNESS_CHECK(isNew, | ||
| "You encountered a UUID collision inside " | ||
| "`BlankNodeManager::createBlockSet()`. Consider " | ||
| "yourself to be very (un)lucky!"); | ||
| auto res = std::make_shared<Blocks>(this, uuid); | ||
| it->second = res; | ||
| return res; | ||
| } | ||
|
|
||
| // _____________________________________________________________________________ | ||
| void BlankNodeManager::freeBlockSet(const Blocks& blocks) { | ||
| // We keep the lock the whole time because we have to perform a consistent, | ||
| // transactional operation on the `state_`, which itself is not threadsafe. | ||
| state_.withWriteLock([&blocks](auto& state) { | ||
| // First unregister the UUID. | ||
| auto it = state.managedBlockSets_.find(blocks.uuid_); | ||
| if (it == state.managedBlockSets_.end()) { | ||
| // Note: it is very hard to manually trigger this condition in unit tests, | ||
| // because it depends on very subtle race conditions, and the reusing of | ||
| // explicit `UUUID`s which in my understanding can currently never happen. | ||
| // We nevertheless still return silently here to make the code more | ||
| // robust. | ||
| return; | ||
| } | ||
| // This `if` check guards against a very rare condition where timings AND | ||
| // UUIDs have to collide. In particular, we expect the value to be expired, | ||
| // because this function is only called in the destructor of the object that | ||
| // the `weak_ptr` points to, so after there are no more `shared_ptr`s to | ||
| // this object. | ||
| if (it->second.expired()) { | ||
| state.managedBlockSets_.erase(it); | ||
| } | ||
| auto& usedBlockSet = state.usedBlocksSet_; | ||
| for (const auto& block : blocks.blocks_) { | ||
| AD_CONTRACT_CHECK(usedBlockSet.contains(block.blockIdx_)); | ||
| usedBlockSet.erase(block.blockIdx_); | ||
| } | ||
| }); | ||
| } | ||
|
|
||
| // _____________________________________________________________________________ | ||
| std::shared_ptr<BlankNodeManager::Blocks> | ||
| BlankNodeManager::registerAndAllocateBlockSet( | ||
| const LocalBlankNodeManager::OwnedBlocksEntry& entry) { | ||
| // We keep the lock the whole time to avoid race conditions between | ||
| // registering the UUID and allocating the blocks. | ||
| auto lockOpt = std::optional{state_.wlock()}; | ||
| auto& lock = lockOpt.value(); | ||
|
|
||
| // Try to insert a new `nullptr` at the given UUID. If the insertion | ||
| // succeeds, we will later emplace a useful value. | ||
| auto [it, isNew] = lock->managedBlockSets_.try_emplace( | ||
| entry.uuid_, std::shared_ptr<Blocks>(nullptr)); | ||
|
|
||
| // Note: the following `nullptr` check might become true for two reasons: | ||
| // 1. We have newly inserted the UUID (likely), or 2. We have found an expired | ||
| // `weak_ptr` from a previous usage of the same UUID, where we are currently | ||
| // racing against its deletion (very unlikely). Note, that 2., even if it | ||
| // happens never causes a problem, as the `freeBlockSet` function also | ||
| // gracefully handles this case. | ||
| if (auto ptr = it->second.lock(); ptr == nullptr) { | ||
| auto blocks = std::make_shared<Blocks>(this, entry.uuid_); | ||
| // At the end of this scope is a return. But in the case of an exception | ||
| // (e.g. if the `AD_CONTRACT_CHECK` below fires, we have to unlock, before | ||
| // the destructor of the `blocks` runs, otherwise we are in a deadlock. | ||
| auto cleanup = absl::Cleanup{[&lockOpt]() { lockOpt.reset(); }}; | ||
| it->second = blocks; | ||
| // If the block is new, we need to allocate all the specified block indices. | ||
| for (const auto& idx : entry.blockIndices_) { | ||
| auto& usedBlocksSet = lock->usedBlocksSet_; | ||
| AD_CONTRACT_CHECK( | ||
| !usedBlocksSet.contains(idx), | ||
| "Trying to explicitly allocate a block of blank nodes that " | ||
| "has previously already been allocated."); | ||
| usedBlocksSet.insert(idx); | ||
| blocks->blocks_.emplace_back(Block(idx, minIndex_ + idx * blockSize_)); | ||
| } | ||
| return blocks; | ||
| } else { | ||
| // We have found a preexisting, nonexpired `Blocks` object with the | ||
| // requested UUID, just return a shared_ptr to the stored `Blocks` object. | ||
| AD_CORRECTNESS_CHECK(ptr != nullptr); | ||
| AD_CORRECTNESS_CHECK(ql::ranges::equal( | ||
| entry.blockIndices_, | ||
| ptr->blocks_ | ql::views::transform(&Block::blockIdx_))); | ||
| return ptr; | ||
| } | ||
| } | ||
|
|
||
| } // namespace ad_utility | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason why the
auto ->syntax is used instead of a directly specified return type?