feat(trace-utils)!: introduce VecMap datastructure#2022
Conversation
🎉 All green!🧪 All tests passed 🎯 Code Coverage (details) 🔗 Commit SHA: 8612e8f | Docs | Datadog PR Page | Give us feedback! |
📚 Documentation Check Results📦
|
Clippy Allow Annotation ReportComparing clippy allow annotations between branches:
Summary by Rule
Annotation Counts by File
Annotation Stats by Crate
About This ReportThis report tracks Clippy allow annotations for specific rules, showing how they've changed in this PR. Decreasing the number of these annotations generally improves code quality. |
🔒 Cargo Deny Results📦
|
267605b to
b1718c4
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2022 +/- ##
==========================================
+ Coverage 72.84% 72.88% +0.03%
==========================================
Files 458 459 +1
Lines 75789 75925 +136
==========================================
+ Hits 55210 55336 +126
- Misses 20579 20589 +10
🚀 New features to boost your workflow:
|
Artifact Size Benchmark Reportaarch64-alpine-linux-musl
aarch64-unknown-linux-gnu
libdatadog-x64-windows
libdatadog-x86-windows
x86_64-alpine-linux-musl
x86_64-unknown-linux-gnu
|
b1718c4 to
64f4421
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 64f4421cf2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| fn serialize<S: Serializer>(&self, serializer: S) -> Result<S::Ok, S::Error> { | ||
| let mut map = serializer.serialize_map(None)?; | ||
| let mut seen = HashSet::with_capacity(self.len()); | ||
| for (k, v) in self.0.iter().rev() { | ||
| if seen.insert(k) { | ||
| map.serialize_entry(k, v)?; | ||
| } | ||
| } | ||
| map.end() | ||
| } |
There was a problem hiding this comment.
This will make everything reverse order, not just the duplicates. If serialized we don't care about ordering this is fine. Otherwise a map with upsert without reversing would achieve the same effect without reversing outputs for non-dedup entries but is slightly slower.
There was a problem hiding this comment.
Also building this first would give us the dedup size and so we wouldn't need to have the serializer.serialize_map(None).
Am I missing something ?
There was a problem hiding this comment.
I think reverse order is fine. I should make that explicit, but VecMap doesn't provide any guarantees about the iteration order of stuff with different keys, as most maps (HashMap in particular, which has a random iteration order by default).
The problem of building the deduped map first is that it causes an additional intermediate allocation. Or we could just count in a first loop, but then we'd have to clear the HashSet and to start again hash everything twice. None of those are really a no-go, but I could avoid them this way.
| /// enough so such that the size penalty of duplication is expected to be reasonable. | ||
| /// | ||
| /// **Important**: note that only [VecMap::get] and [VecMap::get_mut] are duplicate-aware, so to | ||
| /// speak. [Vec::len], [Vec::iter], and others just delegates to the underlying `Vec`, and won't |
There was a problem hiding this comment.
| /// speak. [Vec::len], [Vec::iter], and others just delegates to the underlying `Vec`, and won't | |
| /// speak. [VecMap::len], [VecMap::iter], and others just delegates to the underlying `Vec`, and won't |
| @@ -0,0 +1,260 @@ | |||
| // Copyright 2026-Present Datadog, Inc. https://www.datadoghq.com/ | |||
There was a problem hiding this comment.
Github doesn't support thread comments so putting my thoughts here to keep it clear.
Overall this seems like it could be very beneficial to perf, have you considered:
- Using a dependency that does it (ex: https://github.com/p-avital/vec-map-rs)
- Storing keys and values in separate Vecs which as explained in vec-map-rs's readme could make the key lookup even faster by packing them close together.
- Adding micro benchmarks to compare HashMap/BTreeMap/VecMap(/vec-map-rs) operations on different number of keys, size of keys and size of values ?
- Having a higher level hybrid struct that underneath uses a VecMap until a certain threshold in the number of keys is reached then switches to HashMap ? (might be overkill)
There was a problem hiding this comment.
could make the key lookup even faster by packing them close together.
Attribute lists are usually pretty small (less than 10s of elements), and we need to chase one pointer when iterating anyway to read the string, and we would need to do two allocations instead of one so I have doubts on the perf gain.
There was a problem hiding this comment.
Those are all very good points:
- There's always a balance when pulling in a new dependency. For libdatadog, the cons are size explosion, and supply chain attack, most notably. Since the code is really simple (it's mostly boilerplate to be honest), I felt like having a bespoke datastructure here was preferable. But I think both approaches are reasonable. Another thing with having our own is that we can customize it to our needs, typically the tombstone-like removal (well it could probably easily be implemented as a wrapper of vec-map-rs, so maybe not a great example). Meta and metrics are very special cases of maps, and we might be able to take advantage of that even further?
- As @paullegranddc mentioned, I don't think it's worth the complexity or even that it would be a win. We do not care that much for get already. But to keep in mind for potential evolutions?
- We plan to bench this in dd-trace-rs and other real usage. For now this code is dead, so it doesn't impact anything that exists already. But maybe having a bunch of micro benches doesn't hurt, and can be easily generated by an LLM. I'll look into it!
- Without thinking too much into it, I think this might be overkill for now. Simplicity, beside being beneficial for maintenance, usually wins in terms of performance in a surprisingly vast number of use-cases: less branching, no pointer chasing and cache misses, etc. We should bench real tracers and only think about a more complex scheme if it turns out there are largish maps that show unacceptable perf degradation.
| @@ -0,0 +1,260 @@ | |||
| // Copyright 2026-Present Datadog, Inc. https://www.datadoghq.com/ | |||
There was a problem hiding this comment.
could make the key lookup even faster by packing them close together.
Attribute lists are usually pretty small (less than 10s of elements), and we need to chase one pointer when iterating anyway to read the string, and we would need to do two allocations instead of one so I have doubts on the perf gain.
| impl<K: Serialize + Eq + Hash, V: Serialize> Serialize for VecMap<K, V> { | ||
| fn serialize<S: Serializer>(&self, serializer: S) -> Result<S::Ok, S::Error> { | ||
| let mut map = serializer.serialize_map(Some(self.0.len()))?; | ||
| let mut map = serializer.serialize_map(None)?; |
There was a problem hiding this comment.
This is not great as rpm_serde will allocate an intermediary buffer and copy it to the main buffer once it contains the length https://docs.rs/rmp-serde/latest/src/rmp_serde/encode.rs.html#474-494
I think we should deduplicate before serialization/rely on the agent semantic that last duplicate key wins
There was a problem hiding this comment.
This is a point where I hesitated. At some point I had a DedupedVecMap wrapper, and only implemented serialize on it. However, it's surprisingly annoying to avoid any additional allocation when doing it in-place. You either need to clone the keys (maybe that's ok?), because all the nice iterators or retain helpers take a closure that holds key references that don't survive long enough.
One possibility is to clone keys to put them in the set. I think we could technically avoid that but that would mean re-implementing retain manually, basically... Probably clone is fine.
| let mut map = serializer.serialize_map(None)?; | ||
| let mut seen = HashSet::with_capacity(self.len()); | ||
| for (k, v) in self.0.iter().rev() { | ||
| if seen.insert(k) { | ||
| map.serialize_entry(k, v)?; | ||
| } | ||
| } |
There was a problem hiding this comment.
What about doing this then? It allocates a bit more for because we use a hashmap with the index instead of a set but does the same amount of work
| let mut map = serializer.serialize_map(None)?; | |
| let mut seen = HashSet::with_capacity(self.len()); | |
| for (k, v) in self.0.iter().rev() { | |
| if seen.insert(k) { | |
| map.serialize_entry(k, v)?; | |
| } | |
| } | |
| let mut seen = HashMap::with_capacity(self.len()); | |
| for (i, (k, _)) in self.0.iter().rev().enumerate() { | |
| seen.entry(k.as_ref()).or_insert(i); | |
| } | |
| let mut map = serializer.serialize_map(Some(seen.len()))?; | |
| for (k, v) in seen.values().map(|i| self.0.get(i)) { | |
| map.serialize_entry(k, v)?; | |
| } |
There was a problem hiding this comment.
Right, moving everything to a hashmap is another possibility, but as you said it does allocate a bit more. If we do this I wonder if we shouldn't bite the bullet and just build a hashmap out of the vec and iterate over it. If the value are 64 bits, this get rid of fetching back stuff from the vec, and waste no more. If values can be locally large though, maybe not.
What does this PR do?
This PR is the first of a series that backports the change buffer implementation for dd-trace-js from experimentation to production in libdatadog.
This PR introduces a replacement for
HashMap:VecMap. It's not used anywhere else; this is left for follow-up PRs, but the idea is to use it for themeta,metrics, andmeta_structfields of v04 spans.Motivation
The change-buffer module (introduced in subsequent PRs) experimentations showed that we spend quite some time expanding hashmaps and thus re-hashing during span reconstruction. Introducing a map-like data structure that is optimized for construction/insertion (which is the vast majority of operations) showed to be beneficial.
Even if not optimized for reads, we expect that given the small cardinality of meta and metrics in a span (< 20 entries in general), linear scan is actually going to perform well because of cache locality. Removal is one operation that is made quite more expensive (although still linear), but can be mitigated later with tombstones (also still linear, but doesn't require shifting). Tombstones aren't implemented for this first version.
Additional Notes
Deduplication is performed at serialization time. We initially set up for sort + deduplication to avoid allocation, but this would require larger changes in libdatadog by requiring Span values to be
Ord. For now, deduplication is based on aHashSetsinceSpanData::Textis alreadyHash, but this incurs one allocation. I think it's reasonable because the serialization path is not hot anyway, and we can later requestOrdon values to go the in-placesort_deduproute instead if desired.How to test the change?
Run tests.