feat(trace-utils)!: introduce VecMap datastructure by yannham · Pull Request #2022 · DataDog/libdatadog

yannham · 2026-05-21T12:49:42Z

What does this PR do?

This PR is the first of a series that backports the change buffer implementation for dd-trace-js from experimentation to production in libdatadog.

This PR introduces a replacement for HashMap: VecMap. It's not used anywhere else; this is left for follow-up PRs, but the idea is to use it for the meta, metrics, and meta_struct fields of v04 spans.

Motivation

The change-buffer module (introduced in subsequent PRs) experimentations showed that we spend quite some time expanding hashmaps and thus re-hashing during span reconstruction. Introducing a map-like data structure that is optimized for construction/insertion (which is the vast majority of operations) showed to be beneficial.

Even if not optimized for reads, we expect that given the small cardinality of meta and metrics in a span (< 20 entries in general), linear scan is actually going to perform well because of cache locality. Removal is one operation that is made quite more expensive (although still linear), but can be mitigated later with tombstones (also still linear, but doesn't require shifting). Tombstones aren't implemented for this first version.

Additional Notes

Deduplication is performed at serialization time. We initially set up for sort + deduplication to avoid allocation, but this would require larger changes in libdatadog by requiring Span values to be Ord. For now, deduplication is based on a HashSet since SpanData::Text is already Hash, but this incurs one allocation. I think it's reasonable because the serialization path is not hot anyway, and we can later request Ord on values to go the in-place sort_dedup route instead if desired.

How to test the change?

Run tests.

datadog-prod-us1-5 · 2026-05-21T12:51:36Z

Tests

🎉 All green!

🧪 All tests passed
❄️ No new flaky tests detected

🎯 Code Coverage (details)
• Patch Coverage: 84.56%
• Overall Coverage: 72.88% (+0.05%)

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 8612e8f | Docs | Datadog PR Page | Give us feedback!}

github-actions · 2026-05-21T12:52:34Z

📚 Documentation Check Results

⚠️ 592 documentation warning(s) found

📦 `libdd-trace-utils` - 592 warning(s)

Updated: 2026-05-22 10:52:19 UTC | Commit: 3d910cf | missing-docs job results

github-actions · 2026-05-21T12:52:39Z

Clippy Allow Annotation Report

Comparing clippy allow annotations between branches:

Base Branch: origin/main
PR Branch: origin/yannham/span-vec-fields

Summary by Rule

Rule	Base Branch	PR Branch	Change

Annotation Counts by File

File	Base Branch	PR Branch	Change

Annotation Stats by Crate

Crate	Base Branch	PR Branch	Change
`clippy-annotation-reporter`	5	5	No change (0%)
`datadog-ffe-ffi`	1	1	No change (0%)
`datadog-ipc`	21	21	No change (0%)
`datadog-live-debugger`	6	6	No change (0%)
`datadog-live-debugger-ffi`	10	10	No change (0%)
`datadog-profiling-replayer`	4	4	No change (0%)
`datadog-remote-config`	3	3	No change (0%)
`datadog-sidecar`	57	57	No change (0%)
`libdd-common`	13	13	No change (0%)
`libdd-common-ffi`	12	12	No change (0%)
`libdd-data-pipeline`	5	5	No change (0%)
`libdd-ddsketch`	2	2	No change (0%)
`libdd-dogstatsd-client`	1	1	No change (0%)
`libdd-profiling`	13	13	No change (0%)
`libdd-telemetry`	20	20	No change (0%)
`libdd-tinybytes`	4	4	No change (0%)
`libdd-trace-normalization`	2	2	No change (0%)
`libdd-trace-obfuscation`	3	3	No change (0%)
`libdd-trace-stats`	1	1	No change (0%)
`libdd-trace-utils`	15	15	No change (0%)
Total	198	198	No change (0%)

About This Report

This report tracks Clippy allow annotations for specific rules, showing how they've changed in this PR. Decreasing the number of these annotations generally improves code quality.

github-actions · 2026-05-21T12:53:36Z

🔒 Cargo Deny Results

⚠️ 4 issue(s) found, showing only errors (advisories, bans, sources)

📦 `libdd-trace-utils` - 4 error(s)

Show output

error[unsound]: Rand is unsound with a custom logger using `rand::rng()`
    ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:177:1
    │
177 │ rand 0.8.5 registry+https://github.com/rust-lang/crates.io-index
    │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ unsound advisory detected
    │
    ├ ID: RUSTSEC-2026-0097
    ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0097
    ├ It has been reported (by @lopopolo) that the `rand` library is [unsound](https://rust-lang.github.io/unsafe-code-guidelines/glossary.html#soundness-of-code--of-a-library) (i.e. that safe code using the public API can cause Undefined Behaviour) when all the following conditions are met:
      
      - The `log` and `thread_rng` features are enabled
      - A [custom logger](https://docs.rs/log/latest/log/#implementing-a-logger) is defined
      - The custom logger accesses `rand::rng()` (previously `rand::thread_rng()`) and calls any `TryRng` (previously `RngCore`) methods on `ThreadRng`
      - The `ThreadRng` (attempts to) reseed while called from the custom logger (this happens every 64 kB of generated data)
      - Trace-level logging is enabled or warn-level logging is enabled and the random source (the `getrandom` crate) is unable to provide a new seed
      
      `TryRng` (previously `RngCore`) methods for `ThreadRng` use `unsafe` code to cast `*mut BlockRng<ReseedingCore>` to `&mut BlockRng<ReseedingCore>`. When all the above conditions are met this results in an aliased mutable reference, violating the Stacked Borrows rules. Miri is able to detect this violation in sample code. Since construction of [aliased mutable references is Undefined Behaviour](https://doc.rust-lang.org/stable/nomicon/references.html), the behaviour of optimized builds is hard to predict.
    ├ Announcement: https://github.com/rust-random/rand/pull/1763
    ├ Solution: Upgrade to >=0.10.1 OR <0.10.0, >=0.9.3 OR <0.9.0, >=0.8.6 (try `cargo update -p rand`)
    ├ rand v0.8.5
      ├── (dev) libdd-common v4.1.0
      │   ├── libdd-capabilities-impl v2.0.0
      │   │   └── libdd-trace-utils v4.0.0
      │   │       └── (dev) libdd-trace-utils v4.0.0 (*)
      │   └── libdd-trace-utils v4.0.0 (*)
      ├── (dev) libdd-trace-normalization v2.0.0
      │   └── libdd-trace-utils v4.0.0 (*)
      ├── libdd-trace-utils v4.0.0 (*)
      └── proptest v1.5.0
          └── (dev) libdd-tinybytes v1.1.1
              ├── (dev) libdd-tinybytes v1.1.1 (*)
              └── libdd-trace-utils v4.0.0 (*)

error[vulnerability]: Name constraints for URI names were incorrectly accepted
    ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:199:1
    │
199 │ rustls-webpki 0.103.10 registry+https://github.com/rust-lang/crates.io-index
    │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ security vulnerability detected
    │
    ├ ID: RUSTSEC-2026-0098
    ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0098
    ├ Name constraints for URI names were ignored and therefore accepted.
      
      Note this library does not provide an API for asserting URI names, and URI name constraints are otherwise not implemented.  URI name constraints are now rejected unconditionally.
      
      Since name constraints are restrictions on otherwise properly-issued certificates, this bug is reachable only after signature verification and requires misissuance to exploit.
      
      This vulnerability is identified as [GHSA-965h-392x-2mh5](https://github.com/rustls/webpki/security/advisories/GHSA-965h-392x-2mh5). Thank you to @1seal for the report.
    ├ Solution: Upgrade to >=0.103.12, <0.104.0-alpha.1 OR >=0.104.0-alpha.6 (try `cargo update -p rustls-webpki`)
    ├ rustls-webpki v0.103.10
      └── rustls v0.23.37
          ├── hyper-rustls v0.27.7
          │   └── libdd-common v4.1.0
          │       ├── libdd-capabilities-impl v2.0.0
          │       │   └── libdd-trace-utils v4.0.0
          │       │       └── (dev) libdd-trace-utils v4.0.0 (*)
          │       └── libdd-trace-utils v4.0.0 (*)
          ├── libdd-common v4.1.0 (*)
          └── tokio-rustls v0.26.0
              ├── hyper-rustls v0.27.7 (*)
              └── libdd-common v4.1.0 (*)

error[vulnerability]: Name constraints were accepted for certificates asserting a wildcard name
    ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:199:1
    │
199 │ rustls-webpki 0.103.10 registry+https://github.com/rust-lang/crates.io-index
    │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ security vulnerability detected
    │
    ├ ID: RUSTSEC-2026-0099
    ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0099
    ├ Permitted subtree name constraints for DNS names were accepted for certificates asserting a wildcard name.
      
      This was incorrect because, given a name constraint of `accept.example.com`, `*.example.com` could feasibly allow a name of `reject.example.com` which is outside the constraint.
      This is very similar to [CVE-2025-61727](https://go.dev/issue/76442).
      
      Since name constraints are restrictions on otherwise properly-issued certificates, this bug is reachable only after signature verification and requires misissuance to exploit.
      
      This vulnerability is identified as [GHSA-xgp8-3hg3-c2mh](https://github.com/rustls/webpki/security/advisories/GHSA-xgp8-3hg3-c2mh). Thank you to @1seal for the report.
    ├ Solution: Upgrade to >=0.103.12, <0.104.0-alpha.1 OR >=0.104.0-alpha.6 (try `cargo update -p rustls-webpki`)
    ├ rustls-webpki v0.103.10
      └── rustls v0.23.37
          ├── hyper-rustls v0.27.7
          │   └── libdd-common v4.1.0
          │       ├── libdd-capabilities-impl v2.0.0
          │       │   └── libdd-trace-utils v4.0.0
          │       │       └── (dev) libdd-trace-utils v4.0.0 (*)
          │       └── libdd-trace-utils v4.0.0 (*)
          ├── libdd-common v4.1.0 (*)
          └── tokio-rustls v0.26.0
              ├── hyper-rustls v0.27.7 (*)
              └── libdd-common v4.1.0 (*)

error[vulnerability]: Reachable panic in certificate revocation list parsing
    ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:199:1
    │
199 │ rustls-webpki 0.103.10 registry+https://github.com/rust-lang/crates.io-index
    │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ security vulnerability detected
    │
    ├ ID: RUSTSEC-2026-0104
    ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0104
    ├ A panic was reachable when parsing certificate revocation lists via [`BorrowedCertRevocationList::from_der`]
      or [`OwnedCertRevocationList::from_der`].  This was the result of mishandling a syntactically valid empty
      `BIT STRING` appearing in the `onlySomeReasons` element of a `IssuingDistributionPoint` CRL extension.
      
      This panic is reachable prior to a CRL's signature being verified.
      
      Applications that do not use CRLs are not affected.
      
      Thank you to @tynus3 for the report.
    ├ Solution: Upgrade to >=0.103.13, <0.104.0-alpha.1 OR >=0.104.0-alpha.7 (try `cargo update -p rustls-webpki`)
    ├ rustls-webpki v0.103.10
      └── rustls v0.23.37
          ├── hyper-rustls v0.27.7
          │   └── libdd-common v4.1.0
          │       ├── libdd-capabilities-impl v2.0.0
          │       │   └── libdd-trace-utils v4.0.0
          │       │       └── (dev) libdd-trace-utils v4.0.0 (*)
          │       └── libdd-trace-utils v4.0.0 (*)
          ├── libdd-common v4.1.0 (*)
          └── tokio-rustls v0.26.0
              ├── hyper-rustls v0.27.7 (*)
              └── libdd-common v4.1.0 (*)

advisories FAILED, bans ok, sources ok

Updated: 2026-05-22 10:54:14 UTC | Commit: 3d910cf | dependency-check job results

codecov-commenter · 2026-05-22T09:12:43Z

Codecov Report

❌ Patch coverage is 84.55882% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.88%. Comparing base (09d307d) to head (8612e8f).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2022      +/-   ##
==========================================
+ Coverage   72.84%   72.88%   +0.03%     
==========================================
  Files         458      459       +1     
  Lines       75789    75925     +136     
==========================================
+ Hits        55210    55336     +126     
- Misses      20579    20589      +10

Components	Coverage Δ
libdd-crashtracker	`65.21% <ø> (-0.02%)`	⬇️
libdd-crashtracker-ffi	`36.82% <ø> (ø)`
libdd-alloc	`98.77% <ø> (ø)`
libdd-data-pipeline	`86.69% <ø> (ø)`
libdd-data-pipeline-ffi	`78.63% <ø> (ø)`
libdd-common	`79.81% <ø> (ø)`
libdd-common-ffi	`74.41% <ø> (ø)`
libdd-telemetry	`73.34% <ø> (-0.03%)`	⬇️
libdd-telemetry-ffi	`31.36% <ø> (ø)`
libdd-dogstatsd-client	`82.64% <ø> (ø)`
datadog-ipc	`76.22% <ø> (ø)`
libdd-profiling	`81.70% <ø> (-0.02%)`	⬇️
libdd-profiling-ffi	`64.79% <ø> (ø)`
libdd-sampling	`97.46% <ø> (ø)`
datadog-sidecar	`29.06% <ø> (+0.01%)`	⬆️
datdog-sidecar-ffi	`9.52% <ø> (+0.22%)`	⬆️
spawn-worker	`48.86% <ø> (ø)`
libdd-tinybytes	`93.16% <ø> (ø)`
libdd-trace-normalization	`81.71% <ø> (ø)`
libdd-trace-obfuscation	`87.30% <ø> (ø)`
libdd-trace-protobuf	`68.25% <ø> (ø)`
libdd-trace-utils	`88.79% <84.55%> (-0.08%)`	⬇️
libdd-tracer-flare	`86.88% <ø> (ø)`
libdd-log	`74.83% <ø> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

dd-octo-sts · 2026-05-22T09:28:08Z

Artifact Size Benchmark Report

aarch64-alpine-linux-musl

Artifact	Baseline	Commit	Change
/aarch64-alpine-linux-musl/lib/libdatadog_profiling.so	7.57 MB	7.57 MB	0% (0 B) 👌
/aarch64-alpine-linux-musl/lib/libdatadog_profiling.a	82.01 MB	82.01 MB	0% (0 B) 👌

aarch64-unknown-linux-gnu

Artifact	Baseline	Commit	Change
/aarch64-unknown-linux-gnu/lib/libdatadog_profiling.a	98.27 MB	98.27 MB	0% (0 B) 👌
/aarch64-unknown-linux-gnu/lib/libdatadog_profiling.so	10.02 MB	10.02 MB	0% (0 B) 👌

libdatadog-x64-windows

Artifact	Baseline	Commit	Change
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.dll	24.54 MB	24.54 MB	0% (0 B) 👌
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.lib	81.48 KB	81.48 KB	0% (0 B) 👌
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.pdb	180.48 MB	180.47 MB	-0% (-16.00 KB) 👌
/libdatadog-x64-windows/debug/static/datadog_profiling_ffi.lib	914.99 MB	914.99 MB	0% (0 B) 👌
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.dll	7.76 MB	7.76 MB	0% (0 B) 👌
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.lib	81.48 KB	81.48 KB	0% (0 B) 👌
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.pdb	23.24 MB	23.24 MB	0% (0 B) 👌
/libdatadog-x64-windows/release/static/datadog_profiling_ffi.lib	45.50 MB	45.50 MB	0% (0 B) 👌

libdatadog-x86-windows

Artifact	Baseline	Commit	Change
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.dll	21.15 MB	21.15 MB	0% (0 B) 👌
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.lib	82.76 KB	82.76 KB	0% (0 B) 👌
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.pdb	184.74 MB	184.71 MB	--.01% (-32.00 KB) 💪
/libdatadog-x86-windows/debug/static/datadog_profiling_ffi.lib	900.70 MB	900.70 MB	0% (0 B) 👌
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.dll	6.01 MB	6.01 MB	0% (0 B) 👌
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.lib	82.76 KB	82.76 KB	0% (0 B) 👌
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.pdb	24.90 MB	24.90 MB	0% (0 B) 👌
/libdatadog-x86-windows/release/static/datadog_profiling_ffi.lib	43.00 MB	43.00 MB	0% (0 B) 👌

x86_64-alpine-linux-musl

Artifact	Baseline	Commit	Change
/x86_64-alpine-linux-musl/lib/libdatadog_profiling.a	73.11 MB	73.11 MB	0% (0 B) 👌
/x86_64-alpine-linux-musl/lib/libdatadog_profiling.so	8.45 MB	8.45 MB	0% (0 B) 👌

x86_64-unknown-linux-gnu

Artifact	Baseline	Commit	Change
/x86_64-unknown-linux-gnu/lib/libdatadog_profiling.a	90.92 MB	90.92 MB	0% (0 B) 👌
/x86_64-unknown-linux-gnu/lib/libdatadog_profiling.so	10.08 MB	10.08 MB	0% (0 B) 👌

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 64f4421cf2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Aaalibaba42 · 2026-05-22T11:29:09Z

+    fn serialize<S: Serializer>(&self, serializer: S) -> Result<S::Ok, S::Error> {
+        let mut map = serializer.serialize_map(None)?;
+        let mut seen = HashSet::with_capacity(self.len());
+        for (k, v) in self.0.iter().rev() {
+            if seen.insert(k) {
+                map.serialize_entry(k, v)?;
+            }
+        }
+        map.end()
+    }


This will make everything reverse order, not just the duplicates. If serialized we don't care about ordering this is fine. Otherwise a map with upsert without reversing would achieve the same effect without reversing outputs for non-dedup entries but is slightly slower.

Also building this first would give us the dedup size and so we wouldn't need to have the serializer.serialize_map(None).

Am I missing something ?

I think reverse order is fine. I should make that explicit, but VecMap doesn't provide any guarantees about the iteration order of stuff with different keys, as most maps (HashMap in particular, which has a random iteration order by default).

The problem of building the deduped map first is that it causes an additional intermediate allocation. Or we could just count in a first loop, but then we'd have to clear the HashSet and to start again hash everything twice. None of those are really a no-go, but I could avoid them this way.

Aaalibaba42 · 2026-05-22T11:29:38Z

+/// enough so such that the size penalty of duplication is expected to be reasonable.
+///
+/// **Important**: note that only [VecMap::get] and [VecMap::get_mut] are duplicate-aware, so to
+/// speak. [Vec::len], [Vec::iter], and others just delegates to the underlying `Vec`, and won't


Suggested change

/// speak. [Vec::len], [Vec::iter], and others just delegates to the underlying `Vec`, and won't

/// speak. [VecMap::len], [VecMap::iter], and others just delegates to the underlying `Vec`, and won't

Eldolfin · 2026-05-22T11:50:31Z

@@ -0,0 +1,260 @@
+// Copyright 2026-Present Datadog, Inc. https://www.datadoghq.com/


Github doesn't support thread comments so putting my thoughts here to keep it clear.

Overall this seems like it could be very beneficial to perf, have you considered:

Using a dependency that does it (ex: https://github.com/p-avital/vec-map-rs)

Storing keys and values in separate Vecs which as explained in vec-map-rs's readme could make the key lookup even faster by packing them close together.

Adding micro benchmarks to compare HashMap/BTreeMap/VecMap(/vec-map-rs) operations on different number of keys, size of keys and size of values ?

Having a higher level hybrid struct that underneath uses a VecMap until a certain threshold in the number of keys is reached then switches to HashMap ? (might be overkill)

could make the key lookup even faster by packing them close together.

Attribute lists are usually pretty small (less than 10s of elements), and we need to chase one pointer when iterating anyway to read the string, and we would need to do two allocations instead of one so I have doubts on the perf gain.

Those are all very good points:

There's always a balance when pulling in a new dependency. For libdatadog, the cons are size explosion, and supply chain attack, most notably. Since the code is really simple (it's mostly boilerplate to be honest), I felt like having a bespoke datastructure here was preferable. But I think both approaches are reasonable. Another thing with having our own is that we can customize it to our needs, typically the tombstone-like removal (well it could probably easily be implemented as a wrapper of vec-map-rs, so maybe not a great example). Meta and metrics are very special cases of maps, and we might be able to take advantage of that even further?

As @paullegranddc mentioned, I don't think it's worth the complexity or even that it would be a win. We do not care that much for get already. But to keep in mind for potential evolutions?

We plan to bench this in dd-trace-rs and other real usage. For now this code is dead, so it doesn't impact anything that exists already. But maybe having a bunch of micro benches doesn't hurt, and can be easily generated by an LLM. I'll look into it!

Without thinking too much into it, I think this might be overkill for now. Simplicity, beside being beneficial for maintenance, usually wins in terms of performance in a surprisingly vast number of use-cases: less branching, no pointer chasing and cache misses, etc. We should bench real tracers and only think about a more complex scheme if it turns out there are largish maps that show unacceptable perf degradation.

paullegranddc · 2026-05-22T12:26:41Z

@@ -0,0 +1,260 @@
+// Copyright 2026-Present Datadog, Inc. https://www.datadoghq.com/


could make the key lookup even faster by packing them close together.

Attribute lists are usually pretty small (less than 10s of elements), and we need to chase one pointer when iterating anyway to read the string, and we would need to do two allocations instead of one so I have doubts on the perf gain.

paullegranddc · 2026-05-22T12:33:44Z

 impl<K: Serialize + Eq + Hash, V: Serialize> Serialize for VecMap<K, V> {
    fn serialize<S: Serializer>(&self, serializer: S) -> Result<S::Ok, S::Error> {
-        let mut map = serializer.serialize_map(Some(self.0.len()))?;
+        let mut map = serializer.serialize_map(None)?;


This is not great as rpm_serde will allocate an intermediary buffer and copy it to the main buffer once it contains the length https://docs.rs/rmp-serde/latest/src/rmp_serde/encode.rs.html#474-494

I think we should deduplicate before serialization/rely on the agent semantic that last duplicate key wins

This is a point where I hesitated. At some point I had a DedupedVecMap wrapper, and only implemented serialize on it. However, it's surprisingly annoying to avoid any additional allocation when doing it in-place. You either need to clone the keys (maybe that's ok?), because all the nice iterators or retain helpers take a closure that holds key references that don't survive long enough.

One possibility is to clone keys to put them in the set. I think we could technically avoid that but that would mean re-implementing retain manually, basically... Probably clone is fine.

paullegranddc · 2026-05-22T13:44:03Z

+        let mut map = serializer.serialize_map(None)?;
+        let mut seen = HashSet::with_capacity(self.len());
+        for (k, v) in self.0.iter().rev() {
+            if seen.insert(k) {
+                map.serialize_entry(k, v)?;
+            }
+        }


What about doing this then? It allocates a bit more for because we use a hashmap with the index instead of a set but does the same amount of work

Suggested change

let mut map = serializer.serialize_map(None)?;

let mut seen = HashSet::with_capacity(self.len());

for (k, v) in self.0.iter().rev() {

if seen.insert(k) {

map.serialize_entry(k, v)?;

}

}

let mut seen = HashMap::with_capacity(self.len());

for (i, (k, _)) in self.0.iter().rev().enumerate() {

seen.entry(k.as_ref()).or_insert(i);

}

let mut map = serializer.serialize_map(Some(seen.len()))?;

for (k, v) in seen.values().map(|i| self.0.get(i)) {

map.serialize_entry(k, v)?;

}

Right, moving everything to a hashmap is another possibility, but as you said it does allocate a bit more. If we do this I wonder if we shouldn't bite the bullet and just build a hashmap out of the vec and iterate over it. If the value are 64 bits, this get rid of fetching back stuff from the vec, and waste no more. If values can be locally large though, maybe not.

github-actions Bot added mini-agent sidecar data-pipeline labels May 21, 2026

yannham force-pushed the yannham/span-vec-fields branch 2 times, most recently from 267605b to b1718c4 Compare May 22, 2026 08:56

github-actions Bot removed sidecar data-pipeline labels May 22, 2026

feat!(span): introduce VecMap datastructure

64f4421

yannham force-pushed the yannham/span-vec-fields branch from b1718c4 to 64f4421 Compare May 22, 2026 09:57

yannham changed the title ~~refactor(trace-utils)!: use Vec instead of HashMap for Span meta/metrics/meta_struct~~ feat(trace-utils)!: introduce VecMap datastructure May 22, 2026

yannham marked this pull request as ready for review May 22, 2026 10:08

yannham requested review from a team as code owners May 22, 2026 10:08

chatgpt-codex-connector Bot reviewed May 22, 2026

View reviewed changes

Comment thread libdd-trace-utils/src/span/vec_map.rs Outdated

yannham added 2 commits May 22, 2026 12:49

chore: minor improvements

96b0a76

fix: do not provide wrong size to serializer

8612e8f

Aaalibaba42 reviewed May 22, 2026

View reviewed changes

Eldolfin reviewed May 22, 2026

View reviewed changes

paullegranddc reviewed May 22, 2026

View reviewed changes

	/// speak. [Vec::len], [Vec::iter], and others just delegates to the underlying `Vec`, and won't
	/// speak. [VecMap::len], [VecMap::iter], and others just delegates to the underlying `Vec`, and won't

		@@ -0,0 +1,260 @@
		// Copyright 2026-Present Datadog, Inc. https://www.datadoghq.com/

Conversation

yannham commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Motivation

Additional Notes

How to test the change?

Uh oh!

datadog-prod-us1-5 Bot commented May 21, 2026 • edited by datadog-datadog-prod-us1 Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📚 Documentation Check Results

📦 libdd-trace-utils - 592 warning(s)

Uh oh!

github-actions Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Clippy Allow Annotation Report

Summary by Rule

Annotation Counts by File

Annotation Stats by Crate

About This Report

Uh oh!

github-actions Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔒 Cargo Deny Results

📦 libdd-trace-utils - 4 error(s)

Uh oh!

codecov-commenter commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

dd-octo-sts Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Artifact Size Benchmark Report

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

yannham commented May 21, 2026 •

edited

Loading

datadog-prod-us1-5 Bot commented May 21, 2026 •

edited by datadog-datadog-prod-us1 Bot

Loading

github-actions Bot commented May 21, 2026 •

edited

Loading

📦 `libdd-trace-utils` - 592 warning(s)

github-actions Bot commented May 21, 2026 •

edited

Loading

github-actions Bot commented May 21, 2026 •

edited

Loading

📦 `libdd-trace-utils` - 4 error(s)

codecov-commenter commented May 22, 2026 •

edited

Loading

dd-octo-sts Bot commented May 22, 2026 •

edited

Loading