Skip to content

feat(trace-utils)!: introduce VecMap datastructure#2022

Open
yannham wants to merge 3 commits into
mainfrom
yannham/span-vec-fields
Open

feat(trace-utils)!: introduce VecMap datastructure#2022
yannham wants to merge 3 commits into
mainfrom
yannham/span-vec-fields

Conversation

@yannham
Copy link
Copy Markdown
Contributor

@yannham yannham commented May 21, 2026

What does this PR do?

This PR is the first of a series that backports the change buffer implementation for dd-trace-js from experimentation to production in libdatadog.

This PR introduces a replacement for HashMap: VecMap. It's not used anywhere else; this is left for follow-up PRs, but the idea is to use it for the meta, metrics, and meta_struct fields of v04 spans.

Motivation

The change-buffer module (introduced in subsequent PRs) experimentations showed that we spend quite some time expanding hashmaps and thus re-hashing during span reconstruction. Introducing a map-like data structure that is optimized for construction/insertion (which is the vast majority of operations) showed to be beneficial.

Even if not optimized for reads, we expect that given the small cardinality of meta and metrics in a span (< 20 entries in general), linear scan is actually going to perform well because of cache locality. Removal is one operation that is made quite more expensive (although still linear), but can be mitigated later with tombstones (also still linear, but doesn't require shifting). Tombstones aren't implemented for this first version.

Additional Notes

Deduplication is performed at serialization time. We initially set up for sort + deduplication to avoid allocation, but this would require larger changes in libdatadog by requiring Span values to be Ord. For now, deduplication is based on a HashSet since SpanData::Text is already Hash, but this incurs one allocation. I think it's reasonable because the serialization path is not hot anyway, and we can later request Ord on values to go the in-place sort_dedup route instead if desired.

How to test the change?

Run tests.

@datadog-prod-us1-5
Copy link
Copy Markdown

datadog-prod-us1-5 Bot commented May 21, 2026

Tests

🎉 All green!

🧪 All tests passed
❄️ No new flaky tests detected

🎯 Code Coverage (details)
Patch Coverage: 84.56%
Overall Coverage: 72.88% (+0.05%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 8612e8f | Docs | Datadog PR Page | Give us feedback!

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 21, 2026

📚 Documentation Check Results

⚠️ 592 documentation warning(s) found

📦 libdd-trace-utils - 592 warning(s)


Updated: 2026-05-22 10:52:19 UTC | Commit: 3d910cf | missing-docs job results

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 21, 2026

Clippy Allow Annotation Report

Comparing clippy allow annotations between branches:

  • Base Branch: origin/main
  • PR Branch: origin/yannham/span-vec-fields

Summary by Rule

Rule Base Branch PR Branch Change

Annotation Counts by File

File Base Branch PR Branch Change

Annotation Stats by Crate

Crate Base Branch PR Branch Change
clippy-annotation-reporter 5 5 No change (0%)
datadog-ffe-ffi 1 1 No change (0%)
datadog-ipc 21 21 No change (0%)
datadog-live-debugger 6 6 No change (0%)
datadog-live-debugger-ffi 10 10 No change (0%)
datadog-profiling-replayer 4 4 No change (0%)
datadog-remote-config 3 3 No change (0%)
datadog-sidecar 57 57 No change (0%)
libdd-common 13 13 No change (0%)
libdd-common-ffi 12 12 No change (0%)
libdd-data-pipeline 5 5 No change (0%)
libdd-ddsketch 2 2 No change (0%)
libdd-dogstatsd-client 1 1 No change (0%)
libdd-profiling 13 13 No change (0%)
libdd-telemetry 20 20 No change (0%)
libdd-tinybytes 4 4 No change (0%)
libdd-trace-normalization 2 2 No change (0%)
libdd-trace-obfuscation 3 3 No change (0%)
libdd-trace-stats 1 1 No change (0%)
libdd-trace-utils 15 15 No change (0%)
Total 198 198 No change (0%)

About This Report

This report tracks Clippy allow annotations for specific rules, showing how they've changed in this PR. Decreasing the number of these annotations generally improves code quality.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 21, 2026

🔒 Cargo Deny Results

⚠️ 4 issue(s) found, showing only errors (advisories, bans, sources)

📦 libdd-trace-utils - 4 error(s)

Show output
error[unsound]: Rand is unsound with a custom logger using `rand::rng()`
    ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:177:1
    │
177 │ rand 0.8.5 registry+https://github.com/rust-lang/crates.io-index
    │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ unsound advisory detected
    │
    ├ ID: RUSTSEC-2026-0097
    ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0097
    ├ It has been reported (by @lopopolo) that the `rand` library is [unsound](https://rust-lang.github.io/unsafe-code-guidelines/glossary.html#soundness-of-code--of-a-library) (i.e. that safe code using the public API can cause Undefined Behaviour) when all the following conditions are met:
      
      - The `log` and `thread_rng` features are enabled
      - A [custom logger](https://docs.rs/log/latest/log/#implementing-a-logger) is defined
      - The custom logger accesses `rand::rng()` (previously `rand::thread_rng()`) and calls any `TryRng` (previously `RngCore`) methods on `ThreadRng`
      - The `ThreadRng` (attempts to) reseed while called from the custom logger (this happens every 64 kB of generated data)
      - Trace-level logging is enabled or warn-level logging is enabled and the random source (the `getrandom` crate) is unable to provide a new seed
      
      `TryRng` (previously `RngCore`) methods for `ThreadRng` use `unsafe` code to cast `*mut BlockRng<ReseedingCore>` to `&mut BlockRng<ReseedingCore>`. When all the above conditions are met this results in an aliased mutable reference, violating the Stacked Borrows rules. Miri is able to detect this violation in sample code. Since construction of [aliased mutable references is Undefined Behaviour](https://doc.rust-lang.org/stable/nomicon/references.html), the behaviour of optimized builds is hard to predict.
    ├ Announcement: https://github.com/rust-random/rand/pull/1763
    ├ Solution: Upgrade to >=0.10.1 OR <0.10.0, >=0.9.3 OR <0.9.0, >=0.8.6 (try `cargo update -p rand`)
    ├ rand v0.8.5
      ├── (dev) libdd-common v4.1.0
      │   ├── libdd-capabilities-impl v2.0.0
      │   │   └── libdd-trace-utils v4.0.0
      │   │       └── (dev) libdd-trace-utils v4.0.0 (*)
      │   └── libdd-trace-utils v4.0.0 (*)
      ├── (dev) libdd-trace-normalization v2.0.0
      │   └── libdd-trace-utils v4.0.0 (*)
      ├── libdd-trace-utils v4.0.0 (*)
      └── proptest v1.5.0
          └── (dev) libdd-tinybytes v1.1.1
              ├── (dev) libdd-tinybytes v1.1.1 (*)
              └── libdd-trace-utils v4.0.0 (*)

error[vulnerability]: Name constraints for URI names were incorrectly accepted
    ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:199:1
    │
199 │ rustls-webpki 0.103.10 registry+https://github.com/rust-lang/crates.io-index
    │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ security vulnerability detected
    │
    ├ ID: RUSTSEC-2026-0098
    ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0098
    ├ Name constraints for URI names were ignored and therefore accepted.
      
      Note this library does not provide an API for asserting URI names, and URI name constraints are otherwise not implemented.  URI name constraints are now rejected unconditionally.
      
      Since name constraints are restrictions on otherwise properly-issued certificates, this bug is reachable only after signature verification and requires misissuance to exploit.
      
      This vulnerability is identified as [GHSA-965h-392x-2mh5](https://github.com/rustls/webpki/security/advisories/GHSA-965h-392x-2mh5). Thank you to @1seal for the report.
    ├ Solution: Upgrade to >=0.103.12, <0.104.0-alpha.1 OR >=0.104.0-alpha.6 (try `cargo update -p rustls-webpki`)
    ├ rustls-webpki v0.103.10
      └── rustls v0.23.37
          ├── hyper-rustls v0.27.7
          │   └── libdd-common v4.1.0
          │       ├── libdd-capabilities-impl v2.0.0
          │       │   └── libdd-trace-utils v4.0.0
          │       │       └── (dev) libdd-trace-utils v4.0.0 (*)
          │       └── libdd-trace-utils v4.0.0 (*)
          ├── libdd-common v4.1.0 (*)
          └── tokio-rustls v0.26.0
              ├── hyper-rustls v0.27.7 (*)
              └── libdd-common v4.1.0 (*)

error[vulnerability]: Name constraints were accepted for certificates asserting a wildcard name
    ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:199:1
    │
199 │ rustls-webpki 0.103.10 registry+https://github.com/rust-lang/crates.io-index
    │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ security vulnerability detected
    │
    ├ ID: RUSTSEC-2026-0099
    ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0099
    ├ Permitted subtree name constraints for DNS names were accepted for certificates asserting a wildcard name.
      
      This was incorrect because, given a name constraint of `accept.example.com`, `*.example.com` could feasibly allow a name of `reject.example.com` which is outside the constraint.
      This is very similar to [CVE-2025-61727](https://go.dev/issue/76442).
      
      Since name constraints are restrictions on otherwise properly-issued certificates, this bug is reachable only after signature verification and requires misissuance to exploit.
      
      This vulnerability is identified as [GHSA-xgp8-3hg3-c2mh](https://github.com/rustls/webpki/security/advisories/GHSA-xgp8-3hg3-c2mh). Thank you to @1seal for the report.
    ├ Solution: Upgrade to >=0.103.12, <0.104.0-alpha.1 OR >=0.104.0-alpha.6 (try `cargo update -p rustls-webpki`)
    ├ rustls-webpki v0.103.10
      └── rustls v0.23.37
          ├── hyper-rustls v0.27.7
          │   └── libdd-common v4.1.0
          │       ├── libdd-capabilities-impl v2.0.0
          │       │   └── libdd-trace-utils v4.0.0
          │       │       └── (dev) libdd-trace-utils v4.0.0 (*)
          │       └── libdd-trace-utils v4.0.0 (*)
          ├── libdd-common v4.1.0 (*)
          └── tokio-rustls v0.26.0
              ├── hyper-rustls v0.27.7 (*)
              └── libdd-common v4.1.0 (*)

error[vulnerability]: Reachable panic in certificate revocation list parsing
    ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:199:1
    │
199 │ rustls-webpki 0.103.10 registry+https://github.com/rust-lang/crates.io-index
    │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ security vulnerability detected
    │
    ├ ID: RUSTSEC-2026-0104
    ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0104
    ├ A panic was reachable when parsing certificate revocation lists via [`BorrowedCertRevocationList::from_der`]
      or [`OwnedCertRevocationList::from_der`].  This was the result of mishandling a syntactically valid empty
      `BIT STRING` appearing in the `onlySomeReasons` element of a `IssuingDistributionPoint` CRL extension.
      
      This panic is reachable prior to a CRL's signature being verified.
      
      Applications that do not use CRLs are not affected.
      
      Thank you to @tynus3 for the report.
    ├ Solution: Upgrade to >=0.103.13, <0.104.0-alpha.1 OR >=0.104.0-alpha.7 (try `cargo update -p rustls-webpki`)
    ├ rustls-webpki v0.103.10
      └── rustls v0.23.37
          ├── hyper-rustls v0.27.7
          │   └── libdd-common v4.1.0
          │       ├── libdd-capabilities-impl v2.0.0
          │       │   └── libdd-trace-utils v4.0.0
          │       │       └── (dev) libdd-trace-utils v4.0.0 (*)
          │       └── libdd-trace-utils v4.0.0 (*)
          ├── libdd-common v4.1.0 (*)
          └── tokio-rustls v0.26.0
              ├── hyper-rustls v0.27.7 (*)
              └── libdd-common v4.1.0 (*)

advisories FAILED, bans ok, sources ok

Updated: 2026-05-22 10:54:14 UTC | Commit: 3d910cf | dependency-check job results

@yannham yannham force-pushed the yannham/span-vec-fields branch 2 times, most recently from 267605b to b1718c4 Compare May 22, 2026 08:56
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 22, 2026

Codecov Report

❌ Patch coverage is 84.55882% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.88%. Comparing base (09d307d) to head (8612e8f).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2022      +/-   ##
==========================================
+ Coverage   72.84%   72.88%   +0.03%     
==========================================
  Files         458      459       +1     
  Lines       75789    75925     +136     
==========================================
+ Hits        55210    55336     +126     
- Misses      20579    20589      +10     
Components Coverage Δ
libdd-crashtracker 65.21% <ø> (-0.02%) ⬇️
libdd-crashtracker-ffi 36.82% <ø> (ø)
libdd-alloc 98.77% <ø> (ø)
libdd-data-pipeline 86.69% <ø> (ø)
libdd-data-pipeline-ffi 78.63% <ø> (ø)
libdd-common 79.81% <ø> (ø)
libdd-common-ffi 74.41% <ø> (ø)
libdd-telemetry 73.34% <ø> (-0.03%) ⬇️
libdd-telemetry-ffi 31.36% <ø> (ø)
libdd-dogstatsd-client 82.64% <ø> (ø)
datadog-ipc 76.22% <ø> (ø)
libdd-profiling 81.70% <ø> (-0.02%) ⬇️
libdd-profiling-ffi 64.79% <ø> (ø)
libdd-sampling 97.46% <ø> (ø)
datadog-sidecar 29.06% <ø> (+0.01%) ⬆️
datdog-sidecar-ffi 9.52% <ø> (+0.22%) ⬆️
spawn-worker 48.86% <ø> (ø)
libdd-tinybytes 93.16% <ø> (ø)
libdd-trace-normalization 81.71% <ø> (ø)
libdd-trace-obfuscation 87.30% <ø> (ø)
libdd-trace-protobuf 68.25% <ø> (ø)
libdd-trace-utils 88.79% <84.55%> (-0.08%) ⬇️
libdd-tracer-flare 86.88% <ø> (ø)
libdd-log 74.83% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@dd-octo-sts
Copy link
Copy Markdown
Contributor

dd-octo-sts Bot commented May 22, 2026

Artifact Size Benchmark Report

aarch64-alpine-linux-musl
Artifact Baseline Commit Change
/aarch64-alpine-linux-musl/lib/libdatadog_profiling.so 7.57 MB 7.57 MB 0% (0 B) 👌
/aarch64-alpine-linux-musl/lib/libdatadog_profiling.a 82.01 MB 82.01 MB 0% (0 B) 👌
aarch64-unknown-linux-gnu
Artifact Baseline Commit Change
/aarch64-unknown-linux-gnu/lib/libdatadog_profiling.a 98.27 MB 98.27 MB 0% (0 B) 👌
/aarch64-unknown-linux-gnu/lib/libdatadog_profiling.so 10.02 MB 10.02 MB 0% (0 B) 👌
libdatadog-x64-windows
Artifact Baseline Commit Change
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.dll 24.54 MB 24.54 MB 0% (0 B) 👌
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.lib 81.48 KB 81.48 KB 0% (0 B) 👌
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.pdb 180.48 MB 180.47 MB -0% (-16.00 KB) 👌
/libdatadog-x64-windows/debug/static/datadog_profiling_ffi.lib 914.99 MB 914.99 MB 0% (0 B) 👌
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.dll 7.76 MB 7.76 MB 0% (0 B) 👌
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.lib 81.48 KB 81.48 KB 0% (0 B) 👌
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.pdb 23.24 MB 23.24 MB 0% (0 B) 👌
/libdatadog-x64-windows/release/static/datadog_profiling_ffi.lib 45.50 MB 45.50 MB 0% (0 B) 👌
libdatadog-x86-windows
Artifact Baseline Commit Change
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.dll 21.15 MB 21.15 MB 0% (0 B) 👌
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.lib 82.76 KB 82.76 KB 0% (0 B) 👌
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.pdb 184.74 MB 184.71 MB --.01% (-32.00 KB) 💪
/libdatadog-x86-windows/debug/static/datadog_profiling_ffi.lib 900.70 MB 900.70 MB 0% (0 B) 👌
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.dll 6.01 MB 6.01 MB 0% (0 B) 👌
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.lib 82.76 KB 82.76 KB 0% (0 B) 👌
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.pdb 24.90 MB 24.90 MB 0% (0 B) 👌
/libdatadog-x86-windows/release/static/datadog_profiling_ffi.lib 43.00 MB 43.00 MB 0% (0 B) 👌
x86_64-alpine-linux-musl
Artifact Baseline Commit Change
/x86_64-alpine-linux-musl/lib/libdatadog_profiling.a 73.11 MB 73.11 MB 0% (0 B) 👌
/x86_64-alpine-linux-musl/lib/libdatadog_profiling.so 8.45 MB 8.45 MB 0% (0 B) 👌
x86_64-unknown-linux-gnu
Artifact Baseline Commit Change
/x86_64-unknown-linux-gnu/lib/libdatadog_profiling.a 90.92 MB 90.92 MB 0% (0 B) 👌
/x86_64-unknown-linux-gnu/lib/libdatadog_profiling.so 10.08 MB 10.08 MB 0% (0 B) 👌

@yannham yannham force-pushed the yannham/span-vec-fields branch from b1718c4 to 64f4421 Compare May 22, 2026 09:57
@yannham yannham changed the title refactor(trace-utils)!: use Vec instead of HashMap for Span meta/metrics/meta_struct feat(trace-utils)!: introduce VecMap datastructure May 22, 2026
@yannham yannham marked this pull request as ready for review May 22, 2026 10:08
@yannham yannham requested review from a team as code owners May 22, 2026 10:08
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 64f4421cf2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread libdd-trace-utils/src/span/vec_map.rs Outdated
Comment on lines +172 to +181
fn serialize<S: Serializer>(&self, serializer: S) -> Result<S::Ok, S::Error> {
let mut map = serializer.serialize_map(None)?;
let mut seen = HashSet::with_capacity(self.len());
for (k, v) in self.0.iter().rev() {
if seen.insert(k) {
map.serialize_entry(k, v)?;
}
}
map.end()
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will make everything reverse order, not just the duplicates. If serialized we don't care about ordering this is fine. Otherwise a map with upsert without reversing would achieve the same effect without reversing outputs for non-dedup entries but is slightly slower.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also building this first would give us the dedup size and so we wouldn't need to have the serializer.serialize_map(None).

Am I missing something ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think reverse order is fine. I should make that explicit, but VecMap doesn't provide any guarantees about the iteration order of stuff with different keys, as most maps (HashMap in particular, which has a random iteration order by default).

The problem of building the deduped map first is that it causes an additional intermediate allocation. Or we could just count in a first loop, but then we'd have to clear the HashSet and to start again hash everything twice. None of those are really a no-go, but I could avoid them this way.

/// enough so such that the size penalty of duplication is expected to be reasonable.
///
/// **Important**: note that only [VecMap::get] and [VecMap::get_mut] are duplicate-aware, so to
/// speak. [Vec::len], [Vec::iter], and others just delegates to the underlying `Vec`, and won't
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// speak. [Vec::len], [Vec::iter], and others just delegates to the underlying `Vec`, and won't
/// speak. [VecMap::len], [VecMap::iter], and others just delegates to the underlying `Vec`, and won't

@@ -0,0 +1,260 @@
// Copyright 2026-Present Datadog, Inc. https://www.datadoghq.com/
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Github doesn't support thread comments so putting my thoughts here to keep it clear.

Overall this seems like it could be very beneficial to perf, have you considered:

  1. Using a dependency that does it (ex: https://github.com/p-avital/vec-map-rs)
  2. Storing keys and values in separate Vecs which as explained in vec-map-rs's readme could make the key lookup even faster by packing them close together.
  3. Adding micro benchmarks to compare HashMap/BTreeMap/VecMap(/vec-map-rs) operations on different number of keys, size of keys and size of values ?
  4. Having a higher level hybrid struct that underneath uses a VecMap until a certain threshold in the number of keys is reached then switches to HashMap ? (might be overkill)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could make the key lookup even faster by packing them close together.

Attribute lists are usually pretty small (less than 10s of elements), and we need to chase one pointer when iterating anyway to read the string, and we would need to do two allocations instead of one so I have doubts on the perf gain.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are all very good points:

  1. There's always a balance when pulling in a new dependency. For libdatadog, the cons are size explosion, and supply chain attack, most notably. Since the code is really simple (it's mostly boilerplate to be honest), I felt like having a bespoke datastructure here was preferable. But I think both approaches are reasonable. Another thing with having our own is that we can customize it to our needs, typically the tombstone-like removal (well it could probably easily be implemented as a wrapper of vec-map-rs, so maybe not a great example). Meta and metrics are very special cases of maps, and we might be able to take advantage of that even further?
  2. As @paullegranddc mentioned, I don't think it's worth the complexity or even that it would be a win. We do not care that much for get already. But to keep in mind for potential evolutions?
  3. We plan to bench this in dd-trace-rs and other real usage. For now this code is dead, so it doesn't impact anything that exists already. But maybe having a bunch of micro benches doesn't hurt, and can be easily generated by an LLM. I'll look into it!
  4. Without thinking too much into it, I think this might be overkill for now. Simplicity, beside being beneficial for maintenance, usually wins in terms of performance in a surprisingly vast number of use-cases: less branching, no pointer chasing and cache misses, etc. We should bench real tracers and only think about a more complex scheme if it turns out there are largish maps that show unacceptable perf degradation.

@@ -0,0 +1,260 @@
// Copyright 2026-Present Datadog, Inc. https://www.datadoghq.com/
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could make the key lookup even faster by packing them close together.

Attribute lists are usually pretty small (less than 10s of elements), and we need to chase one pointer when iterating anyway to read the string, and we would need to do two allocations instead of one so I have doubts on the perf gain.

impl<K: Serialize + Eq + Hash, V: Serialize> Serialize for VecMap<K, V> {
fn serialize<S: Serializer>(&self, serializer: S) -> Result<S::Ok, S::Error> {
let mut map = serializer.serialize_map(Some(self.0.len()))?;
let mut map = serializer.serialize_map(None)?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not great as rpm_serde will allocate an intermediary buffer and copy it to the main buffer once it contains the length https://docs.rs/rmp-serde/latest/src/rmp_serde/encode.rs.html#474-494

I think we should deduplicate before serialization/rely on the agent semantic that last duplicate key wins

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a point where I hesitated. At some point I had a DedupedVecMap wrapper, and only implemented serialize on it. However, it's surprisingly annoying to avoid any additional allocation when doing it in-place. You either need to clone the keys (maybe that's ok?), because all the nice iterators or retain helpers take a closure that holds key references that don't survive long enough.

One possibility is to clone keys to put them in the set. I think we could technically avoid that but that would mean re-implementing retain manually, basically... Probably clone is fine.

Comment on lines +173 to +179
let mut map = serializer.serialize_map(None)?;
let mut seen = HashSet::with_capacity(self.len());
for (k, v) in self.0.iter().rev() {
if seen.insert(k) {
map.serialize_entry(k, v)?;
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about doing this then? It allocates a bit more for because we use a hashmap with the index instead of a set but does the same amount of work

Suggested change
let mut map = serializer.serialize_map(None)?;
let mut seen = HashSet::with_capacity(self.len());
for (k, v) in self.0.iter().rev() {
if seen.insert(k) {
map.serialize_entry(k, v)?;
}
}
let mut seen = HashMap::with_capacity(self.len());
for (i, (k, _)) in self.0.iter().rev().enumerate() {
seen.entry(k.as_ref()).or_insert(i);
}
let mut map = serializer.serialize_map(Some(seen.len()))?;
for (k, v) in seen.values().map(|i| self.0.get(i)) {
map.serialize_entry(k, v)?;
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, moving everything to a hashmap is another possibility, but as you said it does allocate a bit more. If we do this I wonder if we shouldn't bite the bullet and just build a hashmap out of the vec and iterate over it. If the value are 64 bits, this get rid of fetching back stuff from the vec, and waste no more. If values can be locally large though, maybe not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants