-
Notifications
You must be signed in to change notification settings - Fork 621
Preview Vector Set API support #1346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…interface for now. Mostly interested in validating splaying things out into the MainStore, and the FFI since our likely integrations are non-C#.
…n be a placeholder until we get clever
…ther use, but demonstration
…de with main store keys'; uses magic bit patterns in trailing byte
|
This all makes sense. In storage-v2, we have the notion of a LogRecord record format, that already includes optional fields such as ETag and Expiration. Adding a namespace would be analogous to this, with a couple of differences:
TBD: how to handle larger namespace names in the same framework (e.g., "/user/foo/"), and is that useful/necessary to make as a first-class citizen versus users directly incorporating in the actual key. |
…e with vector spaces
…h ideally we never hit this case)
Namespaces can be used for a lot of things, like replacing the current numbered database implementation with something that is supported cluster-wide as in valkey (In that case they end up in the same AOF file I guess?). Also, quite a few RESP-accepting DBs have a namespace implementation of sorts, e.g. kvrocks. I had an idea of combining ACLs and database numbers (IMHO, this would be usually better than redis's prefix ACLs, because it doesn't require client to cooperate), and namespaces would be just as natural here. |
…e recursive locking shenanigans; include element ids in benchmark data
…because of shared locking context, removing Tsavorite locks made the bug possible
| #region RMW | ||
| /// <inheritdoc /> | ||
| public int GetRMWInitialValueLength(ref VectorInput input) | ||
| => sizeof(byte) + sizeof(int) + input.WriteDesiredSize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't follow the additional sizeof(byte) + sizeof(int) here - why is it added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's namespace + ttl + actual data.
The TTL bits are unused currently but were kept because a) that better with how MainSessionFunctions so some helpers assume it's here b) we thought maybe TTL was going to be necessary.
In store v2 this should go away or least be "the same" as everything else since TTL and namespaces are first-class there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, thinking about this longer... I dunno that the byte makes sense at all - that may just be left over from earlier exploration and I conflated it with the namespace byte added later. I'll poke at it.
| /// <inheritdoc /> | ||
| public bool SingleReader(ref SpanByte key, ref VectorInput input, ref SpanByte value, ref SpanByte dst, ref ReadInfo readInfo) | ||
| { | ||
| Debug.Assert(key.MetadataSize == 1, "Should never read a non-namespaced value with VectorSessionFunctions"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my learning: Just confirming - This also applies to all other functions right? RMW, Write, etc, are always namespaced for VectorSessionFunctions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, the assert should be duplicated onto appropriate paths - will fix.
|
|
||
| ref var ctx = ref ActiveThreadSession.vectorContext; | ||
|
|
||
| tryAgain: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious - why the choice for goto?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was probably futzing about with an interlocked version at some point and kept the goto. It is a smell here, lemme replace that with a while(true).
| } | ||
|
|
||
| // Actually delete the value | ||
| var del = storageSession.basicContext.Delete(ref key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my learning: what happens if a crash or something happens in the middle of TryDeleteVectorSet - That VectorSet will be corrupted? Or from the AOF, Garnet will be able to recover to a valid state?
For example, what happens if we delete a vector set, but fail to set ContextMetadata with the cleanUp flag because a restart happens first - is it possible that all DiskANN data will remain there and not cleaned up?
Maybe I'm missing something, but I was thinking if we should first RMW ContextMetadata with the cleanup flag to ensure that even upon restart we'll know that a particular context needs cleanup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good catch - we will leave the Vector Set in a corrupted state if a crash happens between the RMW completing and the CleanupDroppedIndex completing.
The fix is a little trickier - just moving the ContextMetadata marking isn't enough. While that would let us resume a cleanup of element data after crashing (since incomplete cleanups are enqueued at process start), we wouldn't be able find the index or replication keys.
The RMW is what starts the delete process, so resumption of delete on first touch probably the path forward. Needs a good test, so gonna get that in place first before fixing.
| } | ||
|
|
||
| // Helper to complete read/writes during vector set synthetic op goes async | ||
| static void CompletePending(ref Status status, ref SpanByteAndMemory output, ref TContext context) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: looks like this function repeats 3 times on this file, should we have a common method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consequence of these all being in different files and then consolidated into VectorManager over time. Will DRY up.
| keyWithNamespace.SetNamespaceInPayload(0); | ||
| key.AsReadOnlySpan().CopyTo(keyWithNamespace.AsSpan()); | ||
|
|
||
| Span<byte> dummyBytes = stackalloc byte[4]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: looks like it's unused
| this.storeWrapper = storeWrapper; | ||
|
|
||
| var replayAofStoreWrapper = new StoreWrapper(storeWrapper, recordToAof); | ||
| replayAofStoreWrapper = new StoreWrapper(storeWrapper, recordToAof); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe leave this as a local to prevent from people using it for any other purpose.
| { | ||
| // Allocate session outside of task so we fail "nicely" if something goes wrong with acquiring them | ||
| var allocatedSession = obtainServerSession(); | ||
| if (allocatedSession.activeDbId != self.dbId && !allocatedSession.TrySwitchActiveDatabaseSession(self.dbId)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently cluster supports a single database, we should validate that the VectorManager.dbId = 0
| throw new GarnetException($"Could not switch replication replay session to {self.dbId}, replication will fail"); | ||
| } | ||
|
|
||
| self.replicationReplayTasks[i] = Task.Factory.StartNew( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might have missed it, but we need a way to cancel these tasks in the event the replication stream breaks (i.e. failover).
Implement a small subset of Vector Set functionality, as a base for future full support.
This PR:
VectorSetbit toRecordInfoto allow distinguishing Vector Sets from other keysVADD,VREM,VEMB, andVSIMXB8- which allows passing byte-values without conversion, joiningFP32(which is used for little-endianfloats)XPREQ8- a quantization option which takes in pre-quantized vectors, meant for use withXB8EnableVectorSetPreview/--enable-vector-set-previewA more complete write up of the design, justifications, and remaining work is in vector-set.md.
There is still a lot of work to be done on Vector Sets, but this PR is at a point where it's functional enough to be played with - and there's merit to merging it so other work (Store V2, multi-threaded replication, etc.) isn't complicated.
The "big" things (besides commands and options) that are missing are:
XPREQ8quantizers - implementation here is on DiskANN