Add builtin data structure benchmark framework for booster by Stevengre · Pull Request #4139 · runtimeverification/haskell-backend

Stevengre · 2026-02-25T13:36:23Z

Adds a benchmark framework for profiling booster's builtin K data structure operations (KMap, KSet, KList), including a pipeline benchmark for matchMaps.

What's included:

Booster.Benchmark.Data / Booster.Benchmark.Ops: generate synthetic K terms and benchmark operations (lookup, insert, update, remove, union, intersection, etc.) at various sizes (10–50000)
booster-bench executable using tasty-bench with CSV output
Pipeline benchmarks: matchMaps, term ordering (Ord vs hash-first), substitution application, and a full single-rule rewrite pipeline
Unit tests validating generated data structures
scripts/run-bench-profile.sh for profiling with GHC RTS options

Stevengre · 2026-02-25T13:36:46Z

Benchmark Results (GHC 9.6.5)

KMap Operations

Size	lookup-existing	lookup-missing	size	insert	update	remove	keys	values	in_keys	sortAndDedup
10	85.9 ns	79.7 ns	151 ns	1.47 μs	1.43 μs	1.30 μs	758 ns	734 ns	159 ns	2.65 μs
100	340 ns	522 ns	231 ns	12.7 μs	13.1 μs	12.7 μs	6.73 μs	7.22 μs	1.13 μs	28.2 μs
1000	3.85 μs	4.85 μs	1.32 μs	346 μs	145 μs	150 μs	69.0 μs	79.2 μs	15.0 μs	488 μs
5000	13.9 μs	29.7 μs	12.6 μs	2.12 ms	798 μs	759 μs	359 μs	370 μs	56.7 μs	2.23 ms
10000	20.2 μs	51.4 μs	34.0 μs	7.88 ms	2.18 ms	1.95 ms	818 μs	1.13 ms	93.3 μs	7.53 ms

KSet Operations

Size	in	size	difference	union	intersection	sortAndDedup
10	40.8 ns	13.6 ns	1.25 μs	1.82 μs	1.13 μs	1.79 μs
100	286 ns	82.2 ns	10.5 μs	14.3 μs	10.5 μs	20.8 μs
1000	1.55 μs	2.07 μs	122 μs	154 μs	110 μs	270 μs
5000	19.7 μs	8.69 μs	596 μs	966 μs	578 μs	1.59 ms
10000	37.1 μs	20.3 μs	1.58 ms	2.80 ms	1.88 ms	3.94 ms

KList Operations

Size	get-0	get-middle	get-last	size	range	concat
10	635 ns	670 ns	597 ns	133 ns	1.71 μs	1.33 μs
100	708 ns	827 ns	873 ns	216 ns	4.73 μs	10.4 μs
1000	1.70 μs	2.30 μs	3.23 μs	1.21 μs	31.7 μs	108 μs
5000	5.44 μs	8.20 μs	10.5 μs	5.01 μs	154 μs	572 μs
10000	10.8 μs	14.8 μs	21.2 μs	11.3 μs	361 μs	1.41 ms
50000	48.4 μs	72.6 μs	123 μs	59.3 μs	—	—

Pipeline Benchmarks

Benchmark	Mean
matchMaps (size 10)	6.99 μs
matchMaps (size 100)	175 μs
matchMaps (size 1000)	24.3 ms
ord-term derived (all sizes)	~14.5 ns
ord-term hash-first (all sizes)	~6.6 ns
substitution unchanged-keys (size 1000)	708 μs
substitution changed-keys (size 1000)	881 μs
substitution unchanged-keys (size 5000)	4.97 ms
substitution changed-keys (size 5000)	7.55 ms
full-single-rule-pipeline	10.9 μs

Key observations:

matchMaps at size 1000 takes 24.3 ms — a potential optimization target
Hash-first term comparison is ~2x faster than derived Ord across all sizes
KMap insert shows super-linear growth (likely due to normalization), while lookup scales well
Substitution with changed keys is ~1.5x slower than unchanged keys at size 5000

ehildenb · 2026-02-25T18:05:38Z

@jberthold what do you think of this benchmark? It looks good to me, any changes/additions you would make to it? At least, having it as a test in the repo seems like a good thing.

jberthold

I am generally not opposed to adding benchmarks like these to the code.
There are some details that should be corrected, specifically the test data should definitely not be part of the library code. Other comments are probably minor.

Overall, I am somehow not convinced that optimising the data structures will provide much performance gain. I would imagine the typical map at runtime to not have more than 100 elements (maximum). Maybe the effort is better spent on improving other areas where booster is known to be deficient, such as the implies endpoint.

jberthold · 2026-03-02T11:11:44Z

scripts/run-bench-profile.sh

+if command -v cabal >/dev/null 2>&1; then
+  cmd=(cabal bench booster-bench --flags profiling)
+  if [[ -n "$benchmark_options" ]]; then
+    cmd+=("--benchmark-options=$benchmark_options")
+  fi
+  cmd+=(-- +RTS)
+  cmd+=("${rts_flags[@]}")
+  cmd+=(-RTS)
+elif command -v stack >/dev/null 2>&1; then
+  bench_args=()
+  if [[ -n "$benchmark_options" ]]; then
+    bench_args+=("$benchmark_options")
+  fi


Personally I would give users a choice whether to use cabal or stack (and I would probably run with stack), or at least test for stack first (because I would prefer using stack) but that may be just me.

jberthold · 2026-03-02T11:12:11Z

booster/package.yaml

+    ghc-options:
+    - -rtsopts
+    - -threaded
+    - -with-rtsopts=-N


This is an unrelated change. The predicates-integration test suite does not need multi-threading.

jberthold · 2026-03-02T11:24:25Z

booster/library/Booster/Benchmark/Data.hs

I think it would be much better to keep these two modules Data and Ops outside of the library. They are all about testing and should not be part of the main code.

jberthold · 2026-03-02T11:33:25Z

booster/library/Booster/Benchmark/Data.hs

+import Booster.Pattern.Base
+
+benchmarkSizes :: [Int]
+benchmarkSizes = [10, 100, 1000, 5000, 10000, 50000]


Maybe these sizes should be made parameters (via env.variable or argument to the benchmark program). This will allow the benchmarks to be run as unit tests in reasonable time/scale so their code doesn't become stale without us noticing.

jberthold · 2026-03-02T11:41:44Z

booster/bench/Main.hs

+            [ bench "insert" $ nf (\(m, k, v) -> runMapUpdate m k v) (mapTerm, insertKey, insertValue)
+            , bench "update" $ nf (\(m, k, v) -> runMapUpdate m k v) (mapTerm, existingKey, updateValue)
+            , bench "remove" $ nf (\(m, k) -> runMapRemove m k) (mapTerm, existingKey)
+            , bench "keys" $ nf runMapKeys mapTerm
+            , bench "values" $ nf runMapValues mapTerm


Some of these tests are measuring the generation of the large test data structures (such as mapTerm) as well as the operation they are trying to measure. It might be better to evaluate the data structure in full beforehand (using force) - which may require a NFData instance.

See the long comment on the nf function in Test.Tasty.Bench for more information.

jberthold · 2026-03-02T11:50:57Z

The hpack error in CI comes from using a newer hpack version locally. Probably best to re-run hpack ./booster; hpack ./dev-tools with the hpack provided by the nix shell (nix develop .#cabal).
(Or edit the comment near the top to claim it was hpack version 0.36.0).

Stevengre · 2026-03-03T01:56:33Z

Overall, I am somehow not convinced that optimising the data structures will provide much performance gain. I would imagine the typical map at runtime to not have more than 100 elements (maximum). Maybe the effort is better spent on improving other areas where booster is known to be deficient, such as the implies endpoint.

Hi @jberthold, thanks for reviewing it! Could you please tell me more about these things, which has higher priority to improve? Is there any issues introducing these gaps?

jberthold · 2026-03-03T07:41:56Z

Overall, I am somehow not convinced that optimising the data structures will provide much performance gain. I would imagine the typical map at runtime to not have more than 100 elements (maximum). Maybe the effort is better spent on improving other areas where booster is known to be deficient, such as the implies endpoint.

Hi @jberthold, thanks for reviewing it! Could you please tell me more about these things, which has higher priority to improve? Is there any issues introducing these gaps?

We have a few github issues around the implies endpoint in booster (and the code Booster.Pattern.Implies could be refactored for readability IMHO).
The issues are #3854 for improvements and #3857, #3605 and #3990 recording three specific issues to analyse (these specific issues may have been resolved meanwhile, they need reproducing first).

I also have a branch jb/incremental-improvements-for-implies where I started making changes but never got to get it ready for a PR.

Probably also a good idea to go through the board and try to close stale issues as well as find work.

Stevengre · 2026-03-03T08:08:52Z

Probably also a good idea to go through the board and try to close stale issues as well as find work.

It makes a lot sense. Maybe that's a good start which makes the status of backend easy to track.

We have a few github issues around the implies endpoint in booster (and the code Booster.Pattern.Implies could be refactored for readability IMHO).
The issues are #3854 for improvements and #3857, #3605 and #3990 recording three specific issues to analyse (these specific issues may have been resolved meanwhile, they need reproducing first).
I also have a branch jb/incremental-improvements-for-implies where I started making changes but never got to get it ready for a PR.

Thanks for sharing these! I'll read these issues first.

ehildenb · 2026-03-03T14:57:26Z

I still would like to see this benchmark merged, because I think it is useful. So if you have time @Stevengre to address the changes that Jost pointed out, that would be great!

Stevengre · 2026-03-04T01:43:26Z

Sure, I'd like to refine this PR when I have time.

Stevengre added 2 commits February 25, 2026 21:32

chore: add .worktrees/ to gitignore

c160973

feat(booster): add builtin data structure benchmark framework

fdf7d9e

Stevengre marked this pull request as draft February 25, 2026 13:37

Format with fourmolu

b7e1476

jberthold reviewed Mar 2, 2026

View reviewed changes

Conversation

Stevengre commented Feb 25, 2026

Uh oh!

Stevengre commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Results (GHC 9.6.5)

KMap Operations

KSet Operations

KList Operations

Pipeline Benchmarks

Uh oh!

ehildenb commented Feb 25, 2026

Uh oh!

jberthold left a comment

Choose a reason for hiding this comment

Uh oh!

jberthold Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

jberthold Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

jberthold Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

jberthold Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

jberthold Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

jberthold commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Stevengre commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jberthold commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Stevengre commented Mar 3, 2026

Uh oh!

ehildenb commented Mar 3, 2026

Uh oh!

Stevengre commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Stevengre commented Feb 25, 2026 •

edited

Loading

jberthold commented Mar 2, 2026 •

edited

Loading

Stevengre commented Mar 3, 2026 •

edited

Loading

jberthold commented Mar 3, 2026 •

edited

Loading