Skip to content

Use go slices#49

Open
mhr3 wants to merge 6 commits intovalyala:masterfrom
mhr3:use-go-slices
Open

Use go slices#49
mhr3 wants to merge 6 commits intovalyala:masterfrom
mhr3:use-go-slices

Conversation

@mhr3
Copy link

@mhr3 mhr3 commented Dec 27, 2022

Fixes #48, #33

I've rewritten how the CGO wrapper is done to achieve two things:

  • stop pretending that the pointers passed to C aren't real pointers, so that go can properly adjust them during stack moves (ie stop using the uintptr_t hack, which allows use of stack-allocated buffers for Compress() calls)
  • pass go slices directly to the C calls, which means the buffers don't need to be copied, making things a bit faster

Here's benchmark results against master (run on M1 using go 1.18):

CPU time
name                                          old time/op    new time/op    delta
DecompressDict/blockSize_1/level_3-8            21.3ns ± 2%    25.5ns ± 2%    +19.57%  (p=0.016 n=4+5)
DecompressDict/blockSize_1/level_5-8            21.0ns ± 2%    25.2ns ± 2%    +20.02%  (p=0.016 n=4+5)
DecompressDict/blockSize_1/level_10-8           21.3ns ± 5%    25.4ns ± 3%    +19.18%  (p=0.016 n=4+5)
DecompressDict/blockSize_10/level_3-8           20.8ns ± 4%    25.2ns ± 1%    +20.92%  (p=0.016 n=4+5)
DecompressDict/blockSize_10/level_5-8           20.9ns ± 3%    25.8ns ± 2%    +23.59%  (p=0.016 n=4+5)
DecompressDict/blockSize_10/level_10-8          26.7ns ± 2%    31.2ns ± 1%    +16.90%  (p=0.016 n=4+5)
DecompressDict/blockSize_100/level_3-8          26.7ns ± 0%    32.1ns ± 1%    +20.39%  (p=0.016 n=4+5)
DecompressDict/blockSize_100/level_5-8          24.7ns ± 0%    30.6ns ± 1%    +24.08%  (p=0.016 n=4+5)
DecompressDict/blockSize_100/level_10-8         23.5ns ± 0%    28.8ns ± 1%    +22.55%  (p=0.016 n=4+5)
DecompressDict/blockSize_1000/level_3-8          165ns ± 0%     175ns ± 0%     +6.00%  (p=0.016 n=4+5)
DecompressDict/blockSize_1000/level_5-8          416ns ± 1%     425ns ± 1%     +2.33%  (p=0.016 n=4+5)
DecompressDict/blockSize_1000/level_10-8         424ns ± 4%     421ns ± 1%       ~     (p=1.000 n=4+5)
DecompressDict/blockSize_10000/level_3-8        1.51µs ± 2%    1.51µs ± 1%       ~     (p=0.952 n=4+5)
DecompressDict/blockSize_10000/level_5-8        1.71µs ± 0%    1.72µs ± 1%     +0.62%  (p=0.032 n=4+5)
DecompressDict/blockSize_10000/level_10-8       1.61µs ± 0%    1.61µs ± 1%       ~     (p=0.952 n=4+5)
DecompressDict/blockSize_100000/level_3-8       13.8µs ± 1%    13.7µs ± 1%       ~     (p=0.286 n=4+5)
DecompressDict/blockSize_100000/level_5-8       13.5µs ± 1%    13.6µs ± 1%       ~     (p=1.000 n=4+5)
DecompressDict/blockSize_100000/level_10-8      12.5µs ± 1%    12.5µs ± 3%       ~     (p=0.730 n=4+5)
DecompressDict/blockSize_300000/level_3-8       37.8µs ± 1%    37.6µs ± 1%       ~     (p=0.413 n=4+5)
DecompressDict/blockSize_300000/level_5-8       44.2µs ± 1%    43.7µs ± 0%       ~     (p=0.095 n=4+5)
DecompressDict/blockSize_300000/level_10-8      36.4µs ± 1%    36.1µs ± 1%       ~     (p=0.190 n=4+5)
CompressDict/blockSize_1/level_3-8              38.9ns ± 0%    46.0ns ± 1%    +18.26%  (p=0.016 n=4+5)
CompressDict/blockSize_1/level_5-8              38.9ns ± 0%    45.9ns ± 3%    +17.95%  (p=0.016 n=4+5)
CompressDict/blockSize_1/level_10-8             39.5ns ± 1%    46.2ns ± 0%    +16.81%  (p=0.016 n=4+5)
CompressDict/blockSize_10/level_3-8              159ns ± 1%     167ns ± 2%     +5.36%  (p=0.016 n=4+5)
CompressDict/blockSize_10/level_5-8              170ns ± 1%     178ns ± 2%     +4.85%  (p=0.016 n=4+5)
CompressDict/blockSize_10/level_10-8             178ns ± 1%     185ns ± 1%     +4.29%  (p=0.016 n=4+5)
CompressDict/blockSize_100/level_3-8            81.0ns ± 0%    89.0ns ± 1%     +9.91%  (p=0.016 n=4+5)
CompressDict/blockSize_100/level_5-8             199ns ± 1%     206ns ± 1%     +3.51%  (p=0.016 n=4+5)
CompressDict/blockSize_100/level_10-8            186ns ± 1%     192ns ± 1%     +3.40%  (p=0.016 n=4+5)
CompressDict/blockSize_1000/level_3-8            399ns ± 1%     410ns ± 1%     +2.54%  (p=0.016 n=4+5)
CompressDict/blockSize_1000/level_5-8           1.26µs ± 1%    1.28µs ± 0%     +0.91%  (p=0.016 n=4+5)
CompressDict/blockSize_1000/level_10-8          2.33µs ± 2%    2.35µs ± 2%       ~     (p=0.905 n=4+5)
CompressDict/blockSize_10000/level_3-8          4.09µs ± 2%    4.10µs ± 1%       ~     (p=0.952 n=4+5)
CompressDict/blockSize_10000/level_5-8          10.1µs ± 0%    10.0µs ± 1%     -0.88%  (p=0.000 n=4+5)
CompressDict/blockSize_10000/level_10-8         24.4µs ± 1%    24.3µs ± 0%       ~     (p=0.190 n=4+5)
CompressDict/blockSize_100000/level_3-8         36.7µs ± 1%    36.6µs ± 1%       ~     (p=0.413 n=4+5)
CompressDict/blockSize_100000/level_5-8         94.3µs ± 1%    93.2µs ± 1%     -1.12%  (p=0.032 n=4+5)
CompressDict/blockSize_100000/level_10-8         159µs ± 1%     159µs ± 1%       ~     (p=0.730 n=4+5)
CompressDict/blockSize_300000/level_3-8          149µs ± 1%     149µs ± 1%       ~     (p=0.730 n=4+5)
CompressDict/blockSize_300000/level_5-8          346µs ± 1%     344µs ± 1%       ~     (p=0.413 n=4+5)
CompressDict/blockSize_300000/level_10-8        1.21ms ± 1%    1.19ms ± 1%     -1.45%  (p=0.016 n=4+5)
Compress/blockSize_1/level_3-8                  25.0ns ± 0%    29.8ns ± 0%    +19.10%  (p=0.016 n=4+5)
Compress/blockSize_1/level_5-8                  25.1ns ± 0%    29.8ns ± 0%    +18.95%  (p=0.016 n=4+5)
Compress/blockSize_1/level_10-8                 25.4ns ± 0%    30.3ns ± 1%    +19.24%  (p=0.016 n=4+5)
Compress/blockSize_10/level_3-8                 42.1ns ± 0%    48.1ns ± 0%    +14.33%  (p=0.016 n=4+5)
Compress/blockSize_10/level_5-8                 43.0ns ± 1%    48.8ns ± 1%    +13.27%  (p=0.016 n=4+5)
Compress/blockSize_10/level_10-8                44.7ns ± 2%    50.2ns ± 1%    +12.24%  (p=0.016 n=4+5)
Compress/blockSize_100/level_3-8                 172ns ± 1%     178ns ± 1%     +3.67%  (p=0.016 n=4+5)
Compress/blockSize_100/level_5-8                 236ns ± 0%     242ns ± 1%     +2.69%  (p=0.016 n=4+5)
Compress/blockSize_100/level_10-8                263ns ± 1%     270ns ± 1%     +2.81%  (p=0.016 n=4+5)
Compress/blockSize_1000/level_3-8                818ns ± 1%     821ns ± 1%       ~     (p=0.730 n=4+5)
Compress/blockSize_1000/level_5-8               1.22µs ± 1%    1.20µs ± 1%     -1.66%  (p=0.016 n=4+5)
Compress/blockSize_1000/level_10-8              1.72µs ± 1%    1.74µs ± 1%       ~     (p=0.079 n=4+5)
Compress/blockSize_10000/level_3-8              4.38µs ± 1%    4.41µs ± 1%       ~     (p=0.413 n=4+5)
Compress/blockSize_10000/level_5-8              9.54µs ± 1%    9.09µs ± 1%     -4.68%  (p=0.016 n=4+5)
Compress/blockSize_10000/level_10-8             39.6µs ± 3%    39.1µs ± 1%       ~     (p=0.730 n=4+5)
Compress/blockSize_100000/level_3-8             35.6µs ± 1%    35.2µs ± 3%       ~     (p=0.556 n=4+5)
Compress/blockSize_100000/level_5-8             89.5µs ± 1%    86.2µs ± 1%     -3.69%  (p=0.016 n=4+5)
Compress/blockSize_100000/level_10-8             462µs ± 0%     459µs ± 1%       ~     (p=0.190 n=4+5)
Compress/blockSize_300000/level_3-8              123µs ± 1%     122µs ± 1%     -1.10%  (p=0.032 n=4+5)
Compress/blockSize_300000/level_5-8              319µs ± 1%     322µs ± 3%       ~     (p=0.556 n=4+5)
Compress/blockSize_300000/level_10-8            1.08ms ± 1%    1.09ms ± 3%       ~     (p=0.730 n=4+5)
Decompress/blockSize_1/level_3-8                20.6ns ± 3%    23.8ns ± 3%    +15.18%  (p=0.016 n=4+5)
Decompress/blockSize_1/level_5-8                20.7ns ± 2%    23.7ns ± 4%    +14.34%  (p=0.016 n=4+5)
Decompress/blockSize_1/level_10-8               20.5ns ± 1%    23.2ns ± 4%    +13.09%  (p=0.016 n=4+5)
Decompress/blockSize_10/level_3-8               21.2ns ± 3%    23.4ns ± 3%    +10.56%  (p=0.016 n=4+5)
Decompress/blockSize_10/level_5-8               21.2ns ± 2%    23.8ns ± 4%    +11.96%  (p=0.016 n=4+5)
Decompress/blockSize_10/level_10-8              20.7ns ± 3%    23.6ns ± 2%    +14.14%  (p=0.016 n=4+5)
Decompress/blockSize_100/level_3-8              35.2ns ± 0%    39.9ns ± 0%    +13.37%  (p=0.016 n=4+5)
Decompress/blockSize_100/level_5-8              35.5ns ± 0%    40.0ns ± 1%    +12.54%  (p=0.016 n=4+5)
Decompress/blockSize_100/level_10-8             35.8ns ± 1%    40.2ns ± 1%    +12.27%  (p=0.016 n=4+5)
Decompress/blockSize_1000/level_3-8              449ns ± 1%     455ns ± 0%     +1.26%  (p=0.016 n=4+5)
Decompress/blockSize_1000/level_5-8              448ns ± 0%     458ns ± 1%     +2.03%  (p=0.016 n=4+5)
Decompress/blockSize_1000/level_10-8             444ns ± 0%     470ns ± 3%     +5.93%  (p=0.016 n=4+5)
Decompress/blockSize_10000/level_3-8            1.75µs ± 2%    1.74µs ± 1%       ~     (p=1.000 n=4+5)
Decompress/blockSize_10000/level_5-8            1.74µs ± 1%    1.74µs ± 0%       ~     (p=1.000 n=4+5)
Decompress/blockSize_10000/level_10-8           1.71µs ± 1%    1.73µs ± 1%       ~     (p=0.063 n=4+5)
Decompress/blockSize_100000/level_3-8           12.8µs ± 1%    12.5µs ± 0%     -2.27%  (p=0.029 n=4+4)
Decompress/blockSize_100000/level_5-8           14.7µs ± 1%    14.3µs ± 1%     -2.78%  (p=0.016 n=4+5)
Decompress/blockSize_100000/level_10-8          12.7µs ± 0%    12.5µs ± 0%     -2.06%  (p=0.016 n=4+5)
Decompress/blockSize_300000/level_3-8           39.0µs ± 1%    37.8µs ± 2%     -3.13%  (p=0.016 n=4+5)
Decompress/blockSize_300000/level_5-8           45.2µs ± 1%    43.8µs ± 0%     -3.23%  (p=0.016 n=4+5)
Decompress/blockSize_300000/level_10-8          37.0µs ± 1%    36.6µs ± 2%       ~     (p=0.111 n=4+5)
ReaderDict/blockSize_1/level_3-8                64.3ns ± 1%    58.1ns ± 3%     -9.62%  (p=0.016 n=4+5)
ReaderDict/blockSize_1/level_5-8                66.6ns ± 2%    57.6ns ± 1%    -13.54%  (p=0.016 n=4+5)
ReaderDict/blockSize_1/level_10-8               70.3ns ± 0%    63.9ns ± 1%     -9.08%  (p=0.016 n=4+5)
ReaderDict/blockSize_10/level_3-8               73.6ns ± 1%    68.7ns ± 2%     -6.66%  (p=0.016 n=4+5)
ReaderDict/blockSize_10/level_5-8               71.7ns ± 0%    66.3ns ± 1%     -7.54%  (p=0.016 n=4+5)
ReaderDict/blockSize_10/level_10-8              71.4ns ± 3%    64.8ns ± 1%     -9.22%  (p=0.016 n=4+5)
ReaderDict/blockSize_100/level_3-8               218ns ± 1%     211ns ± 1%     -3.15%  (p=0.016 n=4+5)
ReaderDict/blockSize_100/level_5-8               464ns ± 1%     462ns ± 1%       ~     (p=0.556 n=4+5)
ReaderDict/blockSize_100/level_10-8              459ns ± 1%     456ns ± 1%       ~     (p=0.190 n=4+5)
ReaderDict/blockSize_1000/level_3-8             1.62µs ± 0%    1.58µs ± 1%     -2.18%  (p=0.016 n=4+5)
ReaderDict/blockSize_1000/level_5-8             1.84µs ± 1%    1.78µs ± 1%     -3.29%  (p=0.016 n=4+5)
ReaderDict/blockSize_1000/level_10-8            1.73µs ± 1%    1.71µs ± 8%       ~     (p=0.190 n=4+5)
ReaderDict/blockSize_10000/level_3-8            14.9µs ± 1%    14.5µs ± 2%     -2.63%  (p=0.016 n=4+5)
ReaderDict/blockSize_10000/level_5-8            14.8µs ± 1%    14.4µs ± 2%     -2.36%  (p=0.016 n=4+5)
ReaderDict/blockSize_10000/level_10-8           13.7µs ± 0%    13.2µs ± 2%     -3.62%  (p=0.016 n=4+5)
ReaderDict/blockSize_100000/level_3-8            149µs ± 2%     138µs ± 1%     -6.94%  (p=0.016 n=4+5)
ReaderDict/blockSize_100000/level_5-8            159µs ± 2%     150µs ± 1%     -5.74%  (p=0.016 n=4+5)
ReaderDict/blockSize_100000/level_10-8           131µs ± 1%     123µs ± 1%     -5.78%  (p=0.016 n=4+5)
ReaderDict/blockSize_300000/level_3-8            511µs ± 1%     479µs ± 2%     -6.29%  (p=0.016 n=4+5)
ReaderDict/blockSize_300000/level_5-8            531µs ± 1%     494µs ± 1%     -6.93%  (p=0.016 n=4+5)
ReaderDict/blockSize_300000/level_10-8           415µs ± 1%     396µs ± 1%     -4.57%  (p=0.016 n=4+5)
Reader/blockSize_1/level_3-8                    63.6ns ± 1%    56.6ns ± 2%    -10.96%  (p=0.016 n=4+5)
Reader/blockSize_1/level_5-8                    64.2ns ± 1%    57.1ns ± 1%    -11.03%  (p=0.016 n=4+5)
Reader/blockSize_1/level_10-8                   66.1ns ± 4%    57.4ns ± 1%    -13.14%  (p=0.016 n=4+5)
Reader/blockSize_10/level_3-8                   83.4ns ± 9%    75.1ns ± 1%     -9.92%  (p=0.016 n=4+5)
Reader/blockSize_10/level_5-8                   81.4ns ± 1%    75.9ns ± 3%     -6.80%  (p=0.016 n=4+5)
Reader/blockSize_10/level_10-8                  83.3ns ± 5%    76.2ns ± 1%     -8.57%  (p=0.016 n=4+5)
Reader/blockSize_100/level_3-8                   496ns ± 1%     491ns ± 1%       ~     (p=0.190 n=4+5)
Reader/blockSize_100/level_5-8                   496ns ± 1%     491ns ± 1%       ~     (p=0.111 n=4+5)
Reader/blockSize_100/level_10-8                  490ns ± 1%     487ns ± 0%     -0.71%  (p=0.016 n=4+5)
Reader/blockSize_1000/level_3-8                 1.87µs ± 0%    1.82µs ± 1%     -2.47%  (p=0.016 n=4+5)
Reader/blockSize_1000/level_5-8                 1.87µs ± 0%    1.83µs ± 0%     -2.19%  (p=0.016 n=4+5)
Reader/blockSize_1000/level_10-8                1.85µs ± 0%    1.80µs ± 1%     -2.61%  (p=0.016 n=4+5)
Reader/blockSize_10000/level_3-8                13.9µs ± 1%    13.6µs ± 2%       ~     (p=0.190 n=4+5)
Reader/blockSize_10000/level_5-8                15.9µs ± 2%    15.5µs ± 2%       ~     (p=0.190 n=4+5)
Reader/blockSize_10000/level_10-8               13.8µs ± 1%    13.5µs ± 2%       ~     (p=0.111 n=4+5)
Reader/blockSize_100000/level_3-8                149µs ± 1%     141µs ± 3%     -5.23%  (p=0.016 n=4+5)
Reader/blockSize_100000/level_5-8                162µs ± 1%     153µs ± 2%     -5.70%  (p=0.016 n=4+5)
Reader/blockSize_100000/level_10-8               132µs ± 2%     125µs ± 1%     -5.22%  (p=0.016 n=4+5)
Reader/blockSize_300000/level_3-8                514µs ± 1%     480µs ± 1%     -6.62%  (p=0.016 n=4+5)
Reader/blockSize_300000/level_5-8                536µs ± 2%     502µs ± 1%     -6.26%  (p=0.016 n=4+5)
Reader/blockSize_300000/level_10-8               422µs ± 1%     403µs ± 4%       ~     (p=0.063 n=4+5)
StreamCompress/blockSize_1/level_3-8            6.36µs ± 2%    6.32µs ± 2%       ~     (p=0.365 n=4+5)
StreamCompress/blockSize_1/level_5-8            47.7µs ± 0%    48.1µs ± 1%     +0.87%  (p=0.032 n=4+5)
StreamCompress/blockSize_1/level_10-8            378µs ± 1%     380µs ± 1%       ~     (p=0.190 n=4+5)
StreamCompress/blockSize_10/level_3-8           6.19µs ± 0%    6.22µs ± 0%       ~     (p=0.063 n=4+5)
StreamCompress/blockSize_10/level_5-8           47.6µs ± 1%    48.4µs ± 1%     +1.67%  (p=0.016 n=4+5)
StreamCompress/blockSize_10/level_10-8           377µs ± 1%     379µs ± 0%       ~     (p=0.111 n=4+5)
StreamCompress/blockSize_100/level_3-8          5.66µs ± 2%    5.66µs ± 1%       ~     (p=1.000 n=4+5)
StreamCompress/blockSize_100/level_5-8          46.5µs ± 1%    46.3µs ± 1%       ~     (p=0.905 n=4+5)
StreamCompress/blockSize_100/level_10-8          379µs ± 1%     379µs ± 0%       ~     (p=1.000 n=4+5)
StreamCompress/blockSize_1000/level_3-8         5.39µs ± 2%    5.42µs ± 1%       ~     (p=0.413 n=4+5)
StreamCompress/blockSize_1000/level_5-8         36.1µs ± 3%    36.7µs ± 4%       ~     (p=0.556 n=4+5)
StreamCompress/blockSize_1000/level_10-8         389µs ± 1%     406µs ± 3%     +4.28%  (p=0.016 n=4+5)
StreamCompress/blockSize_10000/level_3-8        38.9µs ± 2%    37.3µs ± 1%     -4.09%  (p=0.016 n=4+5)
StreamCompress/blockSize_10000/level_5-8         112µs ± 2%     111µs ± 3%       ~     (p=0.413 n=4+5)
StreamCompress/blockSize_10000/level_10-8        707µs ± 1%     711µs ± 2%       ~     (p=0.413 n=4+5)
StreamCompress/blockSize_100000/level_3-8        535µs ± 1%     512µs ± 2%     -4.34%  (p=0.016 n=4+5)
StreamCompress/blockSize_100000/level_5-8       1.46ms ± 2%    1.41ms ± 2%     -3.44%  (p=0.016 n=4+5)
StreamCompress/blockSize_100000/level_10-8      5.04ms ± 2%    5.02ms ± 1%       ~     (p=0.905 n=4+5)
StreamCompress/blockSize_300000/level_3-8       1.88ms ± 1%    1.82ms ± 1%     -3.13%  (p=0.016 n=4+5)
StreamCompress/blockSize_300000/level_5-8       5.63ms ± 3%    5.54ms ± 4%       ~     (p=0.111 n=4+5)
StreamCompress/blockSize_300000/level_10-8      18.4ms ± 9%    17.8ms ± 1%       ~     (p=0.730 n=4+5)
StreamDecompress/blockSize_1/level_3-8          68.6ns ± 3%    57.4ns ± 1%    -16.40%  (p=0.016 n=4+5)
StreamDecompress/blockSize_1/level_5-8          70.9ns ± 5%    60.6ns ± 9%    -14.47%  (p=0.016 n=4+5)
StreamDecompress/blockSize_1/level_10-8         71.8ns ± 6%    61.3ns ±13%    -14.62%  (p=0.016 n=4+5)
StreamDecompress/blockSize_10/level_3-8         87.1ns ± 6%    81.7ns ±11%       ~     (p=0.286 n=4+5)
StreamDecompress/blockSize_10/level_5-8         88.9ns ± 6%    76.9ns ± 1%    -13.55%  (p=0.029 n=4+4)
StreamDecompress/blockSize_10/level_10-8        88.0ns ± 4%    80.0ns ± 6%       ~     (p=0.063 n=4+5)
StreamDecompress/blockSize_100/level_3-8         500ns ± 1%     489ns ± 0%     -2.22%  (p=0.016 n=4+5)
StreamDecompress/blockSize_100/level_5-8         493ns ± 0%     491ns ± 1%       ~     (p=0.302 n=4+5)
StreamDecompress/blockSize_100/level_10-8        487ns ± 1%     487ns ± 1%       ~     (p=1.000 n=4+5)
StreamDecompress/blockSize_1000/level_3-8       1.86µs ± 1%    1.82µs ± 0%     -2.14%  (p=0.029 n=4+4)
StreamDecompress/blockSize_1000/level_5-8       1.86µs ± 0%    1.84µs ± 1%     -1.17%  (p=0.032 n=4+5)
StreamDecompress/blockSize_1000/level_10-8      1.84µs ± 1%    1.83µs ± 1%       ~     (p=0.286 n=4+5)
StreamDecompress/blockSize_10000/level_3-8      13.6µs ± 2%    13.7µs ± 3%       ~     (p=0.794 n=4+5)
StreamDecompress/blockSize_10000/level_5-8      15.6µs ± 1%    15.1µs ± 1%     -2.70%  (p=0.016 n=4+5)
StreamDecompress/blockSize_10000/level_10-8     13.4µs ± 0%    13.2µs ± 1%     -1.41%  (p=0.016 n=4+5)
StreamDecompress/blockSize_100000/level_3-8      144µs ± 1%     141µs ± 1%     -2.29%  (p=0.016 n=4+5)
StreamDecompress/blockSize_100000/level_5-8      157µs ± 1%     153µs ± 2%       ~     (p=0.063 n=4+5)
StreamDecompress/blockSize_100000/level_10-8     128µs ± 1%     126µs ± 1%     -1.72%  (p=0.016 n=4+5)
StreamDecompress/blockSize_300000/level_3-8      492µs ± 0%     482µs ± 1%     -1.98%  (p=0.016 n=4+5)
StreamDecompress/blockSize_300000/level_5-8      510µs ± 0%     503µs ± 1%     -1.37%  (p=0.016 n=4+5)
StreamDecompress/blockSize_300000/level_10-8     404µs ± 0%     398µs ± 1%     -1.45%  (p=0.016 n=4+5)
WriterDict/blockSize_1/level_3-8                 255ns ± 1%     264ns ± 1%     +3.72%  (p=0.016 n=4+5)
WriterDict/blockSize_1/level_5-8                 297ns ± 1%     309ns ± 2%     +4.03%  (p=0.016 n=4+5)
WriterDict/blockSize_1/level_10-8                302ns ± 1%     315ns ± 1%     +4.27%  (p=0.016 n=4+5)
WriterDict/blockSize_10/level_3-8                180ns ± 1%     191ns ± 0%     +5.99%  (p=0.016 n=4+5)
WriterDict/blockSize_10/level_5-8                326ns ± 1%     335ns ± 1%     +2.66%  (p=0.016 n=4+5)
WriterDict/blockSize_10/level_10-8               312ns ± 2%     317ns ± 1%       ~     (p=0.190 n=4+5)
WriterDict/blockSize_100/level_3-8               502ns ± 2%     511ns ± 1%       ~     (p=0.111 n=4+5)
WriterDict/blockSize_100/level_5-8              1.46µs ± 1%    1.44µs ± 1%     -1.16%  (p=0.032 n=4+5)
WriterDict/blockSize_100/level_10-8             2.59µs ± 1%    2.53µs ± 1%     -2.28%  (p=0.016 n=4+5)
WriterDict/blockSize_1000/level_3-8             4.59µs ± 1%    4.45µs ± 2%     -3.02%  (p=0.016 n=4+5)
WriterDict/blockSize_1000/level_5-8             11.6µs ± 2%    11.0µs ± 1%     -4.56%  (p=0.016 n=4+5)
WriterDict/blockSize_1000/level_10-8            27.3µs ± 1%    26.1µs ± 1%     -4.55%  (p=0.016 n=4+5)
WriterDict/blockSize_10000/level_3-8            39.8µs ± 0%    38.1µs ± 1%     -4.37%  (p=0.016 n=4+5)
WriterDict/blockSize_10000/level_5-8             116µs ± 2%     110µs ± 1%     -4.87%  (p=0.016 n=4+5)
WriterDict/blockSize_10000/level_10-8            244µs ± 2%     232µs ± 2%     -4.72%  (p=0.016 n=4+5)
WriterDict/blockSize_100000/level_3-8            409µs ± 2%     393µs ± 0%     -3.70%  (p=0.016 n=4+5)
WriterDict/blockSize_100000/level_5-8           1.06ms ± 2%    1.02ms ± 1%     -3.63%  (p=0.016 n=4+5)
WriterDict/blockSize_100000/level_10-8          2.26ms ± 3%    2.17ms ± 2%     -3.97%  (p=0.032 n=4+5)
WriterDict/blockSize_300000/level_3-8           1.26ms ± 1%    1.23ms ± 2%     -2.87%  (p=0.016 n=4+5)
WriterDict/blockSize_300000/level_5-8           3.45ms ± 4%    3.26ms ± 4%     -5.62%  (p=0.016 n=4+5)
WriterDict/blockSize_300000/level_10-8          7.34ms ± 8%    6.88ms ± 8%       ~     (p=0.190 n=4+5)
Writer/blockSize_1/level_3-8                    6.34µs ± 1%    6.30µs ± 0%       ~     (p=0.063 n=4+5)
Writer/blockSize_1/level_5-8                    48.8µs ± 2%    48.4µs ± 2%       ~     (p=0.730 n=4+5)
Writer/blockSize_1/level_10-8                    395µs ± 3%     386µs ± 1%       ~     (p=0.111 n=4+5)
Writer/blockSize_10/level_3-8                   6.23µs ± 1%    6.28µs ± 3%       ~     (p=0.730 n=4+5)
Writer/blockSize_10/level_5-8                   48.2µs ± 3%    48.9µs ± 3%       ~     (p=0.730 n=4+5)
Writer/blockSize_10/level_10-8                   386µs ± 1%     387µs ± 1%       ~     (p=0.730 n=4+5)
Writer/blockSize_100/level_3-8                  5.68µs ± 2%    5.63µs ± 2%       ~     (p=0.286 n=4+5)
Writer/blockSize_100/level_5-8                  46.3µs ± 2%    46.5µs ± 0%       ~     (p=0.286 n=4+5)
Writer/blockSize_100/level_10-8                  385µs ± 1%     383µs ± 1%       ~     (p=0.413 n=4+5)
Writer/blockSize_1000/level_3-8                 5.45µs ± 3%    5.36µs ± 2%       ~     (p=0.190 n=4+5)
Writer/blockSize_1000/level_5-8                 35.4µs ± 1%    37.0µs ± 3%     +4.53%  (p=0.016 n=4+5)
Writer/blockSize_1000/level_10-8                 397µs ± 1%     398µs ± 1%       ~     (p=0.730 n=4+5)
Writer/blockSize_10000/level_3-8                39.3µs ± 2%    38.4µs ± 2%       ~     (p=0.063 n=4+5)
Writer/blockSize_10000/level_5-8                 115µs ± 2%     112µs ± 4%       ~     (p=0.190 n=4+5)
Writer/blockSize_10000/level_10-8                718µs ± 0%     716µs ± 2%       ~     (p=0.286 n=4+5)
Writer/blockSize_100000/level_3-8                538µs ± 3%     526µs ± 1%       ~     (p=0.286 n=4+5)
Writer/blockSize_100000/level_5-8               1.46ms ± 2%    1.45ms ± 4%       ~     (p=0.413 n=4+5)
Writer/blockSize_100000/level_10-8              5.03ms ± 2%    5.11ms ± 3%       ~     (p=0.286 n=4+5)
Writer/blockSize_300000/level_3-8               1.92ms ± 2%    1.88ms ± 1%       ~     (p=0.111 n=4+5)
Writer/blockSize_300000/level_5-8               5.55ms ± 1%    5.50ms ± 2%       ~     (p=0.413 n=4+5)
Writer/blockSize_300000/level_10-8              18.2ms ± 2%    18.0ms ± 2%       ~     (p=0.111 n=4+5)
WriterResetAlloc-8                               142ns ± 6%     163ns ± 0%    +15.02%  (p=0.029 n=4+4)
Throughput
name                                          old speed      new speed      delta
DecompressDict/blockSize_1/level_3-8          46.9MB/s ± 2%  39.2MB/s ± 2%    -16.34%  (p=0.016 n=4+5)
DecompressDict/blockSize_1/level_5-8          47.6MB/s ± 2%  39.7MB/s ± 2%    -16.70%  (p=0.016 n=4+5)
DecompressDict/blockSize_1/level_10-8         47.0MB/s ± 5%  39.4MB/s ± 4%    -16.14%  (p=0.016 n=4+5)
DecompressDict/blockSize_10/level_3-8          480MB/s ± 4%   397MB/s ± 1%    -17.34%  (p=0.016 n=4+5)
DecompressDict/blockSize_10/level_5-8          479MB/s ± 3%   387MB/s ± 2%    -19.13%  (p=0.016 n=4+5)
DecompressDict/blockSize_10/level_10-8         375MB/s ± 2%   321MB/s ± 1%    -14.48%  (p=0.016 n=4+5)
DecompressDict/blockSize_100/level_3-8        3.75GB/s ± 0%  3.11GB/s ± 1%    -16.93%  (p=0.016 n=4+5)
DecompressDict/blockSize_100/level_5-8        4.05GB/s ± 0%  3.27GB/s ± 1%    -19.40%  (p=0.016 n=4+5)
DecompressDict/blockSize_100/level_10-8       4.26GB/s ± 0%  3.48GB/s ± 1%    -18.40%  (p=0.016 n=4+5)
DecompressDict/blockSize_1000/level_3-8       6.06GB/s ± 0%  5.72GB/s ± 0%     -5.66%  (p=0.016 n=4+5)
DecompressDict/blockSize_1000/level_5-8       2.41GB/s ± 1%  2.35GB/s ± 1%     -2.27%  (p=0.016 n=4+5)
DecompressDict/blockSize_1000/level_10-8      2.36GB/s ± 4%  2.37GB/s ± 1%       ~     (p=1.000 n=4+5)
DecompressDict/blockSize_10000/level_3-8      6.61GB/s ± 2%  6.62GB/s ± 1%       ~     (p=0.905 n=4+5)
DecompressDict/blockSize_10000/level_5-8      5.83GB/s ± 0%  5.80GB/s ± 1%     -0.63%  (p=0.032 n=4+5)
DecompressDict/blockSize_10000/level_10-8     6.21GB/s ± 0%  6.21GB/s ± 1%       ~     (p=0.905 n=4+5)
DecompressDict/blockSize_100000/level_3-8     7.27GB/s ± 1%  7.31GB/s ± 1%       ~     (p=0.286 n=4+5)
DecompressDict/blockSize_100000/level_5-8     7.39GB/s ± 1%  7.36GB/s ± 1%       ~     (p=1.000 n=4+5)
DecompressDict/blockSize_100000/level_10-8    8.01GB/s ± 1%  8.00GB/s ± 2%       ~     (p=0.730 n=4+5)
DecompressDict/blockSize_300000/level_3-8     7.94GB/s ± 1%  7.98GB/s ± 1%       ~     (p=0.413 n=4+5)
DecompressDict/blockSize_300000/level_5-8     6.79GB/s ± 1%  6.86GB/s ± 0%       ~     (p=0.111 n=4+5)
DecompressDict/blockSize_300000/level_10-8    8.24GB/s ± 1%  8.31GB/s ± 1%       ~     (p=0.190 n=4+5)
CompressDict/blockSize_1/level_3-8            25.7MB/s ± 0%  21.7MB/s ± 1%    -15.44%  (p=0.016 n=4+5)
CompressDict/blockSize_1/level_5-8            25.7MB/s ± 0%  21.8MB/s ± 3%    -15.21%  (p=0.016 n=4+5)
CompressDict/blockSize_1/level_10-8           25.3MB/s ± 1%  21.7MB/s ± 0%    -14.39%  (p=0.016 n=4+5)
CompressDict/blockSize_10/level_3-8           63.0MB/s ± 1%  59.8MB/s ± 2%     -5.09%  (p=0.016 n=4+5)
CompressDict/blockSize_10/level_5-8           59.0MB/s ± 1%  56.3MB/s ± 2%     -4.62%  (p=0.016 n=4+5)
CompressDict/blockSize_10/level_10-8          56.3MB/s ± 1%  54.0MB/s ± 1%     -4.11%  (p=0.016 n=4+5)
CompressDict/blockSize_100/level_3-8          1.24GB/s ± 0%  1.12GB/s ± 1%     -9.02%  (p=0.016 n=4+5)
CompressDict/blockSize_100/level_5-8           503MB/s ± 1%   486MB/s ± 1%     -3.38%  (p=0.016 n=4+5)
CompressDict/blockSize_100/level_10-8          538MB/s ± 1%   521MB/s ± 1%     -3.29%  (p=0.016 n=4+5)
CompressDict/blockSize_1000/level_3-8         2.50GB/s ± 1%  2.44GB/s ± 1%     -2.47%  (p=0.016 n=4+5)
CompressDict/blockSize_1000/level_5-8          791MB/s ± 1%   784MB/s ± 0%     -0.90%  (p=0.016 n=4+5)
CompressDict/blockSize_1000/level_10-8         429MB/s ± 2%   426MB/s ± 2%       ~     (p=0.905 n=4+5)
CompressDict/blockSize_10000/level_3-8        2.45GB/s ± 2%  2.44GB/s ± 1%       ~     (p=0.905 n=4+5)
CompressDict/blockSize_10000/level_5-8         989MB/s ± 0%   998MB/s ± 1%     +0.89%  (p=0.016 n=4+5)
CompressDict/blockSize_10000/level_10-8        410MB/s ± 1%   412MB/s ± 0%       ~     (p=0.190 n=4+5)
CompressDict/blockSize_100000/level_3-8       2.72GB/s ± 1%  2.73GB/s ± 1%       ~     (p=0.413 n=4+5)
CompressDict/blockSize_100000/level_5-8       1.06GB/s ± 1%  1.07GB/s ± 1%     +1.13%  (p=0.032 n=4+5)
CompressDict/blockSize_100000/level_10-8       628MB/s ± 1%   629MB/s ± 1%       ~     (p=0.730 n=4+5)
CompressDict/blockSize_300000/level_3-8       2.01GB/s ± 1%  2.01GB/s ± 1%       ~     (p=0.730 n=4+5)
CompressDict/blockSize_300000/level_5-8        867MB/s ± 1%   871MB/s ± 1%       ~     (p=0.413 n=4+5)
CompressDict/blockSize_300000/level_10-8       249MB/s ± 1%   253MB/s ± 1%     +1.47%  (p=0.016 n=4+5)
Compress/blockSize_1/level_3-8                40.0MB/s ± 0%  33.5MB/s ± 0%    -16.04%  (p=0.016 n=4+5)
Compress/blockSize_1/level_5-8                39.9MB/s ± 0%  33.5MB/s ± 0%    -15.92%  (p=0.016 n=4+5)
Compress/blockSize_1/level_10-8               39.3MB/s ± 0%  33.0MB/s ± 1%    -16.13%  (p=0.016 n=4+5)
Compress/blockSize_10/level_3-8                238MB/s ± 0%   208MB/s ± 0%    -12.53%  (p=0.016 n=4+5)
Compress/blockSize_10/level_5-8                232MB/s ± 1%   205MB/s ± 1%    -11.72%  (p=0.016 n=4+5)
Compress/blockSize_10/level_10-8               224MB/s ± 2%   199MB/s ± 1%    -10.91%  (p=0.016 n=4+5)
Compress/blockSize_100/level_3-8               582MB/s ± 1%   561MB/s ± 1%     -3.56%  (p=0.016 n=4+5)
Compress/blockSize_100/level_5-8               424MB/s ± 0%   413MB/s ± 1%     -2.61%  (p=0.016 n=4+5)
Compress/blockSize_100/level_10-8              380MB/s ± 1%   370MB/s ± 1%     -2.74%  (p=0.016 n=4+5)
Compress/blockSize_1000/level_3-8             1.22GB/s ± 1%  1.22GB/s ± 1%       ~     (p=0.730 n=4+5)
Compress/blockSize_1000/level_5-8              820MB/s ± 1%   833MB/s ± 1%     +1.68%  (p=0.016 n=4+5)
Compress/blockSize_1000/level_10-8             581MB/s ± 1%   576MB/s ± 1%       ~     (p=0.063 n=4+5)
Compress/blockSize_10000/level_3-8            2.28GB/s ± 1%  2.27GB/s ± 1%       ~     (p=0.413 n=4+5)
Compress/blockSize_10000/level_5-8            1.05GB/s ± 1%  1.10GB/s ± 1%     +4.91%  (p=0.016 n=4+5)
Compress/blockSize_10000/level_10-8            253MB/s ± 3%   255MB/s ± 1%       ~     (p=0.730 n=4+5)
Compress/blockSize_100000/level_3-8           2.81GB/s ± 1%  2.84GB/s ± 3%       ~     (p=0.556 n=4+5)
Compress/blockSize_100000/level_5-8           1.12GB/s ± 1%  1.16GB/s ± 1%     +3.83%  (p=0.016 n=4+5)
Compress/blockSize_100000/level_10-8           217MB/s ± 0%   218MB/s ± 1%       ~     (p=0.190 n=4+5)
Compress/blockSize_300000/level_3-8           2.43GB/s ± 1%  2.46GB/s ± 1%     +1.11%  (p=0.032 n=4+5)
Compress/blockSize_300000/level_5-8            940MB/s ± 1%   932MB/s ± 3%       ~     (p=0.556 n=4+5)
Compress/blockSize_300000/level_10-8           278MB/s ± 1%   276MB/s ± 3%       ~     (p=0.730 n=4+5)
Decompress/blockSize_1/level_3-8              48.5MB/s ± 3%  42.1MB/s ± 3%    -13.20%  (p=0.016 n=4+5)
Decompress/blockSize_1/level_5-8              48.3MB/s ± 2%  42.3MB/s ± 4%    -12.48%  (p=0.016 n=4+5)
Decompress/blockSize_1/level_10-8             48.8MB/s ± 1%  43.1MB/s ± 4%    -11.52%  (p=0.016 n=4+5)
Decompress/blockSize_10/level_3-8              473MB/s ± 3%   428MB/s ± 3%     -9.56%  (p=0.016 n=4+5)
Decompress/blockSize_10/level_5-8              471MB/s ± 2%   421MB/s ± 4%    -10.64%  (p=0.016 n=4+5)
Decompress/blockSize_10/level_10-8             484MB/s ± 3%   424MB/s ± 2%    -12.44%  (p=0.016 n=4+5)
Decompress/blockSize_100/level_3-8            2.84GB/s ± 0%  2.50GB/s ± 0%    -11.80%  (p=0.016 n=4+5)
Decompress/blockSize_100/level_5-8            2.81GB/s ± 0%  2.50GB/s ± 1%    -11.14%  (p=0.016 n=4+5)
Decompress/blockSize_100/level_10-8           2.79GB/s ± 1%  2.49GB/s ± 1%    -10.93%  (p=0.016 n=4+5)
Decompress/blockSize_1000/level_3-8           2.23GB/s ± 1%  2.20GB/s ± 0%     -1.25%  (p=0.016 n=4+5)
Decompress/blockSize_1000/level_5-8           2.23GB/s ± 0%  2.19GB/s ± 1%     -1.98%  (p=0.016 n=4+5)
Decompress/blockSize_1000/level_10-8          2.25GB/s ± 0%  2.13GB/s ± 3%     -5.54%  (p=0.016 n=4+5)
Decompress/blockSize_10000/level_3-8          5.73GB/s ± 2%  5.74GB/s ± 1%       ~     (p=1.000 n=4+5)
Decompress/blockSize_10000/level_5-8          5.75GB/s ± 1%  5.74GB/s ± 0%       ~     (p=1.000 n=4+5)
Decompress/blockSize_10000/level_10-8         5.84GB/s ± 1%  5.79GB/s ± 1%       ~     (p=0.063 n=4+5)
Decompress/blockSize_100000/level_3-8         7.80GB/s ± 1%  7.98GB/s ± 0%     +2.32%  (p=0.029 n=4+4)
Decompress/blockSize_100000/level_5-8         6.81GB/s ± 1%  7.00GB/s ± 1%     +2.85%  (p=0.016 n=4+5)
Decompress/blockSize_100000/level_10-8        7.85GB/s ± 0%  8.01GB/s ± 0%     +2.10%  (p=0.016 n=4+5)
Decompress/blockSize_300000/level_3-8         7.69GB/s ± 1%  7.94GB/s ± 2%     +3.25%  (p=0.016 n=4+5)
Decompress/blockSize_300000/level_5-8         6.63GB/s ± 1%  6.86GB/s ± 0%     +3.33%  (p=0.016 n=4+5)
Decompress/blockSize_300000/level_10-8        8.11GB/s ± 1%  8.19GB/s ± 2%       ~     (p=0.111 n=4+5)
ReaderDict/blockSize_1/level_3-8               155MB/s ± 1%   172MB/s ± 2%    +10.67%  (p=0.016 n=4+5)
ReaderDict/blockSize_1/level_5-8               150MB/s ± 2%   174MB/s ± 1%    +15.66%  (p=0.016 n=4+5)
ReaderDict/blockSize_1/level_10-8              142MB/s ± 0%   156MB/s ± 1%     +9.99%  (p=0.016 n=4+5)
ReaderDict/blockSize_10/level_3-8             1.36GB/s ± 1%  1.46GB/s ± 2%     +7.14%  (p=0.016 n=4+5)
ReaderDict/blockSize_10/level_5-8             1.39GB/s ± 0%  1.51GB/s ± 1%     +8.16%  (p=0.016 n=4+5)
ReaderDict/blockSize_10/level_10-8            1.40GB/s ± 3%  1.54GB/s ± 1%    +10.13%  (p=0.016 n=4+5)
ReaderDict/blockSize_100/level_3-8            4.58GB/s ± 1%  4.73GB/s ± 1%     +3.26%  (p=0.016 n=4+5)
ReaderDict/blockSize_100/level_5-8            2.16GB/s ± 1%  2.16GB/s ± 1%       ~     (p=0.556 n=4+5)
ReaderDict/blockSize_100/level_10-8           2.18GB/s ± 1%  2.19GB/s ± 1%       ~     (p=0.190 n=4+5)
ReaderDict/blockSize_1000/level_3-8           6.18GB/s ± 0%  6.31GB/s ± 1%     +2.23%  (p=0.016 n=4+5)
ReaderDict/blockSize_1000/level_5-8           5.42GB/s ± 1%  5.61GB/s ± 1%     +3.40%  (p=0.016 n=4+5)
ReaderDict/blockSize_1000/level_10-8          5.79GB/s ± 1%  5.85GB/s ± 7%       ~     (p=0.190 n=4+5)
ReaderDict/blockSize_10000/level_3-8          6.73GB/s ± 1%  6.91GB/s ± 2%     +2.71%  (p=0.016 n=4+5)
ReaderDict/blockSize_10000/level_5-8          6.77GB/s ± 1%  6.93GB/s ± 2%     +2.42%  (p=0.016 n=4+5)
ReaderDict/blockSize_10000/level_10-8         7.29GB/s ± 0%  7.56GB/s ± 2%     +3.76%  (p=0.016 n=4+5)
ReaderDict/blockSize_100000/level_3-8         6.73GB/s ± 2%  7.23GB/s ± 1%     +7.45%  (p=0.016 n=4+5)
ReaderDict/blockSize_100000/level_5-8         6.30GB/s ± 2%  6.69GB/s ± 1%     +6.08%  (p=0.016 n=4+5)
ReaderDict/blockSize_100000/level_10-8        7.66GB/s ± 1%  8.13GB/s ± 1%     +6.14%  (p=0.016 n=4+5)
ReaderDict/blockSize_300000/level_3-8         5.87GB/s ± 1%  6.26GB/s ± 2%     +6.72%  (p=0.016 n=4+5)
ReaderDict/blockSize_300000/level_5-8         5.65GB/s ± 1%  6.07GB/s ± 1%     +7.45%  (p=0.016 n=4+5)
ReaderDict/blockSize_300000/level_10-8        7.22GB/s ± 1%  7.57GB/s ± 1%     +4.79%  (p=0.016 n=4+5)
Reader/blockSize_1/level_3-8                   157MB/s ± 1%   177MB/s ± 2%    +12.33%  (p=0.016 n=4+5)
Reader/blockSize_1/level_5-8                   156MB/s ± 1%   175MB/s ± 1%    +12.39%  (p=0.016 n=4+5)
Reader/blockSize_1/level_10-8                  151MB/s ± 4%   174MB/s ± 1%    +15.04%  (p=0.016 n=4+5)
Reader/blockSize_10/level_3-8                 1.20GB/s ± 9%  1.33GB/s ± 1%    +10.71%  (p=0.016 n=4+5)
Reader/blockSize_10/level_5-8                 1.23GB/s ± 1%  1.32GB/s ± 3%     +7.33%  (p=0.016 n=4+5)
Reader/blockSize_10/level_10-8                1.20GB/s ± 4%  1.31GB/s ± 1%     +9.30%  (p=0.016 n=4+5)
Reader/blockSize_100/level_3-8                2.02GB/s ± 1%  2.03GB/s ± 1%       ~     (p=0.190 n=4+5)
Reader/blockSize_100/level_5-8                2.02GB/s ± 1%  2.04GB/s ± 1%       ~     (p=0.111 n=4+5)
Reader/blockSize_100/level_10-8               2.04GB/s ± 1%  2.05GB/s ± 0%     +0.71%  (p=0.016 n=4+5)
Reader/blockSize_1000/level_3-8               5.36GB/s ± 0%  5.49GB/s ± 1%     +2.54%  (p=0.016 n=4+5)
Reader/blockSize_1000/level_5-8               5.35GB/s ± 0%  5.47GB/s ± 0%     +2.25%  (p=0.016 n=4+5)
Reader/blockSize_1000/level_10-8              5.41GB/s ± 0%  5.56GB/s ± 1%     +2.70%  (p=0.016 n=4+5)
Reader/blockSize_10000/level_3-8              7.22GB/s ± 1%  7.35GB/s ± 2%       ~     (p=0.190 n=4+5)
Reader/blockSize_10000/level_5-8              6.30GB/s ± 2%  6.44GB/s ± 2%       ~     (p=0.190 n=4+5)
Reader/blockSize_10000/level_10-8             7.27GB/s ± 1%  7.40GB/s ± 2%       ~     (p=0.111 n=4+5)
Reader/blockSize_100000/level_3-8             6.73GB/s ± 1%  7.10GB/s ± 3%     +5.54%  (p=0.016 n=4+5)
Reader/blockSize_100000/level_5-8             6.18GB/s ± 1%  6.55GB/s ± 2%     +6.05%  (p=0.016 n=4+5)
Reader/blockSize_100000/level_10-8            7.56GB/s ± 2%  7.98GB/s ± 1%     +5.49%  (p=0.016 n=4+5)
Reader/blockSize_300000/level_3-8             5.84GB/s ± 1%  6.25GB/s ± 1%     +7.08%  (p=0.016 n=4+5)
Reader/blockSize_300000/level_5-8             5.60GB/s ± 2%  5.97GB/s ± 1%     +6.67%  (p=0.016 n=4+5)
Reader/blockSize_300000/level_10-8            7.12GB/s ± 1%  7.45GB/s ± 4%       ~     (p=0.063 n=4+5)
StreamCompress/blockSize_1/level_3-8          1.57MB/s ± 1%  1.59MB/s ± 0%       ~     (p=0.429 n=4+4)
StreamCompress/blockSize_1/level_5-8           210kB/s ± 0%   210kB/s ± 0%       ~     (all equal)
StreamCompress/blockSize_1/level_10-8         30.0kB/s ± 0%  30.0kB/s ± 0%       ~     (all equal)
StreamCompress/blockSize_10/level_3-8         16.2MB/s ± 0%  16.1MB/s ± 0%     -0.58%  (p=0.048 n=4+5)
StreamCompress/blockSize_10/level_5-8         2.10MB/s ± 0%  2.07MB/s ± 1%     -1.62%  (p=0.016 n=4+5)
StreamCompress/blockSize_10/level_10-8         268kB/s ± 3%   260kB/s ± 0%       ~     (p=0.238 n=4+5)
StreamCompress/blockSize_100/level_3-8         177MB/s ± 2%   177MB/s ± 1%       ~     (p=1.000 n=4+5)
StreamCompress/blockSize_100/level_5-8        21.5MB/s ± 2%  21.6MB/s ± 1%       ~     (p=0.778 n=4+5)
StreamCompress/blockSize_100/level_10-8       2.64MB/s ± 1%  2.64MB/s ± 1%       ~     (p=1.000 n=4+5)
StreamCompress/blockSize_1000/level_3-8       1.85GB/s ± 2%  1.85GB/s ± 1%       ~     (p=0.413 n=4+5)
StreamCompress/blockSize_1000/level_5-8        277MB/s ± 3%   273MB/s ± 4%       ~     (p=0.556 n=4+5)
StreamCompress/blockSize_1000/level_10-8      25.7MB/s ± 1%  24.7MB/s ± 3%     -4.07%  (p=0.016 n=4+5)
StreamCompress/blockSize_10000/level_3-8      2.57GB/s ± 2%  2.68GB/s ± 1%     +4.25%  (p=0.016 n=4+5)
StreamCompress/blockSize_10000/level_5-8       892MB/s ± 2%   902MB/s ± 3%       ~     (p=0.413 n=4+5)
StreamCompress/blockSize_10000/level_10-8      141MB/s ± 1%   141MB/s ± 2%       ~     (p=0.413 n=4+5)
StreamCompress/blockSize_100000/level_3-8     1.87GB/s ± 1%  1.95GB/s ± 2%     +4.53%  (p=0.016 n=4+5)
StreamCompress/blockSize_100000/level_5-8      685MB/s ± 2%   710MB/s ± 2%     +3.56%  (p=0.016 n=4+5)
StreamCompress/blockSize_100000/level_10-8     199MB/s ± 2%   199MB/s ± 1%       ~     (p=0.905 n=4+5)
StreamCompress/blockSize_300000/level_3-8     1.59GB/s ± 1%  1.65GB/s ± 1%     +3.24%  (p=0.016 n=4+5)
StreamCompress/blockSize_300000/level_5-8      533MB/s ± 3%   542MB/s ± 3%       ~     (p=0.111 n=4+5)
StreamCompress/blockSize_300000/level_10-8     164MB/s ± 9%   169MB/s ± 1%       ~     (p=0.730 n=4+5)
StreamDecompress/blockSize_1/level_3-8         146MB/s ± 3%   174MB/s ± 1%    +19.58%  (p=0.016 n=4+5)
StreamDecompress/blockSize_1/level_5-8         141MB/s ± 5%   165MB/s ± 8%    +17.09%  (p=0.016 n=4+5)
StreamDecompress/blockSize_1/level_10-8        140MB/s ± 6%   164MB/s ±12%    +17.43%  (p=0.016 n=4+5)
StreamDecompress/blockSize_10/level_3-8       1.15GB/s ± 5%  1.23GB/s ±10%       ~     (p=0.286 n=4+5)
StreamDecompress/blockSize_10/level_5-8       1.13GB/s ± 6%  1.30GB/s ± 1%    +15.44%  (p=0.029 n=4+4)
StreamDecompress/blockSize_10/level_10-8      1.14GB/s ± 4%  1.25GB/s ± 6%       ~     (p=0.063 n=4+5)
StreamDecompress/blockSize_100/level_3-8      2.00GB/s ± 1%  2.04GB/s ± 0%     +2.26%  (p=0.016 n=4+5)
StreamDecompress/blockSize_100/level_5-8      2.03GB/s ± 0%  2.04GB/s ± 1%       ~     (p=0.413 n=4+5)
StreamDecompress/blockSize_100/level_10-8     2.05GB/s ± 1%  2.05GB/s ± 1%       ~     (p=1.000 n=4+5)
StreamDecompress/blockSize_1000/level_3-8     5.38GB/s ± 1%  5.50GB/s ± 0%     +2.19%  (p=0.029 n=4+4)
StreamDecompress/blockSize_1000/level_5-8     5.38GB/s ± 0%  5.44GB/s ± 1%     +1.20%  (p=0.032 n=4+5)
StreamDecompress/blockSize_1000/level_10-8    5.44GB/s ± 1%  5.47GB/s ± 1%       ~     (p=0.286 n=4+5)
StreamDecompress/blockSize_10000/level_3-8    7.34GB/s ± 2%  7.32GB/s ± 3%       ~     (p=0.730 n=4+5)
StreamDecompress/blockSize_10000/level_5-8    6.43GB/s ± 1%  6.61GB/s ± 1%     +2.78%  (p=0.016 n=4+5)
StreamDecompress/blockSize_10000/level_10-8   7.48GB/s ± 0%  7.58GB/s ± 1%     +1.44%  (p=0.016 n=4+5)
StreamDecompress/blockSize_100000/level_3-8   6.95GB/s ± 1%  7.11GB/s ± 1%     +2.35%  (p=0.016 n=4+5)
StreamDecompress/blockSize_100000/level_5-8   6.38GB/s ± 1%  6.52GB/s ± 2%       ~     (p=0.063 n=4+5)
StreamDecompress/blockSize_100000/level_10-8  7.80GB/s ± 1%  7.94GB/s ± 1%     +1.75%  (p=0.016 n=4+5)
StreamDecompress/blockSize_300000/level_3-8   6.09GB/s ± 0%  6.22GB/s ± 1%     +2.02%  (p=0.016 n=4+5)
StreamDecompress/blockSize_300000/level_5-8   5.88GB/s ± 0%  5.96GB/s ± 1%     +1.39%  (p=0.016 n=4+5)
StreamDecompress/blockSize_300000/level_10-8  7.42GB/s ± 0%  7.53GB/s ± 1%     +1.47%  (p=0.016 n=4+5)
WriterDict/blockSize_1/level_3-8              39.2MB/s ± 1%  37.8MB/s ± 1%     -3.60%  (p=0.016 n=4+5)
WriterDict/blockSize_1/level_5-8              33.7MB/s ± 1%  32.4MB/s ± 2%     -3.87%  (p=0.016 n=4+5)
WriterDict/blockSize_1/level_10-8             33.1MB/s ± 1%  31.7MB/s ± 1%     -4.09%  (p=0.016 n=4+5)
WriterDict/blockSize_10/level_3-8              555MB/s ± 1%   524MB/s ± 0%     -5.64%  (p=0.016 n=4+5)
WriterDict/blockSize_10/level_5-8              307MB/s ± 1%   299MB/s ± 1%     -2.59%  (p=0.016 n=4+5)
WriterDict/blockSize_10/level_10-8             320MB/s ± 2%   315MB/s ± 1%       ~     (p=0.190 n=4+5)
WriterDict/blockSize_100/level_3-8            1.99GB/s ± 2%  1.96GB/s ± 1%       ~     (p=0.111 n=4+5)
WriterDict/blockSize_100/level_5-8             685MB/s ± 1%   693MB/s ± 1%     +1.19%  (p=0.032 n=4+5)
WriterDict/blockSize_100/level_10-8            386MB/s ± 1%   395MB/s ± 1%     +2.34%  (p=0.016 n=4+5)
WriterDict/blockSize_1000/level_3-8           2.18GB/s ± 1%  2.25GB/s ± 2%     +3.12%  (p=0.016 n=4+5)
WriterDict/blockSize_1000/level_5-8            864MB/s ± 2%   905MB/s ± 1%     +4.76%  (p=0.016 n=4+5)
WriterDict/blockSize_1000/level_10-8           366MB/s ± 1%   383MB/s ± 1%     +4.77%  (p=0.016 n=4+5)
WriterDict/blockSize_10000/level_3-8          2.51GB/s ± 0%  2.63GB/s ± 1%     +4.58%  (p=0.016 n=4+5)
WriterDict/blockSize_10000/level_5-8           864MB/s ± 2%   908MB/s ± 1%     +5.12%  (p=0.016 n=4+5)
WriterDict/blockSize_10000/level_10-8          410MB/s ± 2%   430MB/s ± 2%     +4.95%  (p=0.016 n=4+5)
WriterDict/blockSize_100000/level_3-8         2.45GB/s ± 2%  2.54GB/s ± 0%     +3.84%  (p=0.016 n=4+5)
WriterDict/blockSize_100000/level_5-8          946MB/s ± 2%   982MB/s ± 1%     +3.76%  (p=0.016 n=4+5)
WriterDict/blockSize_100000/level_10-8         443MB/s ± 3%   462MB/s ± 2%     +4.10%  (p=0.032 n=4+5)
WriterDict/blockSize_300000/level_3-8         2.38GB/s ± 1%  2.45GB/s ± 2%     +2.96%  (p=0.016 n=4+5)
WriterDict/blockSize_300000/level_5-8          870MB/s ± 3%   922MB/s ± 3%     +5.95%  (p=0.016 n=4+5)
WriterDict/blockSize_300000/level_10-8         410MB/s ± 7%   437MB/s ± 8%       ~     (p=0.190 n=4+5)
Writer/blockSize_1/level_3-8                  1.58MB/s ± 1%  1.59MB/s ± 0%       ~     (p=0.143 n=4+4)
Writer/blockSize_1/level_5-8                   202kB/s ± 4%   210kB/s ± 0%       ~     (p=0.143 n=4+4)
Writer/blockSize_1/level_10-8                 27.5kB/s ±27%  30.0kB/s ± 0%       ~     (p=0.889 n=4+5)
Writer/blockSize_10/level_3-8                 16.0MB/s ± 1%  15.9MB/s ± 3%       ~     (p=0.730 n=4+5)
Writer/blockSize_10/level_5-8                 2.08MB/s ± 3%  2.04MB/s ± 3%       ~     (p=0.730 n=4+5)
Writer/blockSize_10/level_10-8                 260kB/s ± 0%   260kB/s ± 0%       ~     (all equal)
Writer/blockSize_100/level_3-8                 176MB/s ± 2%   178MB/s ± 2%       ~     (p=0.286 n=4+5)
Writer/blockSize_100/level_5-8                21.6MB/s ± 2%  21.5MB/s ± 0%       ~     (p=0.254 n=4+5)
Writer/blockSize_100/level_10-8               2.60MB/s ± 1%  2.61MB/s ± 1%       ~     (p=0.603 n=4+5)
Writer/blockSize_1000/level_3-8               1.83GB/s ± 3%  1.86GB/s ± 2%       ~     (p=0.190 n=4+5)
Writer/blockSize_1000/level_5-8                283MB/s ± 1%   270MB/s ± 3%     -4.31%  (p=0.016 n=4+5)
Writer/blockSize_1000/level_10-8              25.2MB/s ± 1%  25.1MB/s ± 1%       ~     (p=0.730 n=4+5)
Writer/blockSize_10000/level_3-8              2.55GB/s ± 2%  2.61GB/s ± 2%       ~     (p=0.063 n=4+5)
Writer/blockSize_10000/level_5-8               872MB/s ± 2%   890MB/s ± 4%       ~     (p=0.190 n=4+5)
Writer/blockSize_10000/level_10-8              139MB/s ± 0%   140MB/s ± 2%       ~     (p=0.286 n=4+5)
Writer/blockSize_100000/level_3-8             1.86GB/s ± 3%  1.90GB/s ± 1%       ~     (p=0.286 n=4+5)
Writer/blockSize_100000/level_5-8              683MB/s ± 2%   689MB/s ± 4%       ~     (p=0.413 n=4+5)
Writer/blockSize_100000/level_10-8             199MB/s ± 2%   196MB/s ± 3%       ~     (p=0.286 n=4+5)
Writer/blockSize_300000/level_3-8             1.57GB/s ± 2%  1.59GB/s ± 1%       ~     (p=0.111 n=4+5)
Writer/blockSize_300000/level_5-8              541MB/s ± 1%   546MB/s ± 2%       ~     (p=0.413 n=4+5)
Writer/blockSize_300000/level_10-8             165MB/s ± 2%   166MB/s ± 2%       ~     (p=0.111 n=4+5)

Now you will notice that many results (particularly ones working with tiny buffers) are reporting being up to 20% slower, turns out this is because the CGO pointers checks are now taking significant amount of time, then again we're talking a few nanoseconds and this is completely negligible with larger buffers, so IMO this isn't that bad.
I've also made the Reader write directly into the provided buffer (if it's large enough), and those benchmarks show the biggest gain - about 5% faster when using large buffers. The ability to use the go slice directly could be also used in the Writer, but let's leave that for another PR.

I've also re-run the benchmarks with GODEBUG=cgocheck=0, and the results definitely look even better:

Had to use a gist, github didn't like this long PR description - https://gist.github.com/mhr3/84f58f62353ef3b9db30288df00fa2b3

Copy link

@kamstrup kamstrup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked through this and it looks good to me 👍 (altho, no expert on gozstd)

@mhr3 mhr3 force-pushed the use-go-slices branch from 55095ce to 4edc66a Compare May 28, 2023 22:20
Copy link

@kamstrup kamstrup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Big simplification using go buffers and using pooled buffers 👍 💯


func TestDecompressTooLarge(t *testing.T) {
src := []byte{40, 181, 47, 253, 228, 122, 118, 105, 67, 140, 234, 85, 20, 159, 67}
_, err := Decompress(nil, src)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the test name I gather that the error here is that the decompressed size is too larger for the dst buf (nil)? It would be a bit easier to read if the dst buf was non-nil, like maybe 1 byte or something

reader.go Outdated
zr.sizes.dstPos = 0

inHdr := (*reflect.SliceHeader)(unsafe.Pointer(&zr.inBuf))
outHdr := (*reflect.SliceHeader)(unsafe.Pointer(&dst))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you use the noescape() trick in compressInternal(), but not for dst here... I guess there is no chance stack allocated buffers could be used in this context?

GrigoryEvko added a commit to GrigoryEvko/gozstd that referenced this pull request Aug 2, 2025
- Replace uintptr_t with void* in C wrapper functions
- Use reflect.SliceHeader to access Go slice data pointers directly
- Add zstdIsError helper function for cleaner error checking
- Remove unnecessary stdint.h include

These changes improve performance by 5-7% for large buffer operations
by avoiding pointer indirection and simplifying the CGO interface.

Based on valyala#49
GrigoryEvko added a commit to GrigoryEvko/gozstd that referenced this pull request Aug 2, 2025
…ng codebase

Integrated community contributions:
- PR valyala#49: CGO wrapper improvements for 5-7% performance gain on large buffers
  - Use void* instead of uintptr_t to avoid memory allocations
  - Direct Go slice usage via reflect.SliceHeader
- PR valyala#25: Advanced Compression API with checksum support
  - Added CCtx type for advanced compression contexts
  - Added SetParameter/GetParameter methods
  - Added Reset and Compress2 methods
  - Full support for all ZSTD compression parameters
- PR valyala#63: Exposed CompressDictLevel as public API
  - Allows fine-grained control over dictionary compression levels
- PR valyala#66: RISC-V 64-bit architecture support
  - Updated Zig builder to 0.13.0
  - Added linux_riscv64 target
- PR valyala#60: Memory-optimized dictionary functions
  - Added NewCDictByRef/NewDDictByRef to avoid data copying
  - Reduces memory usage for large dictionaries

Infrastructure improvements:
- Created modern Dockerfile with Alpine Linux and latest Zig
- Fixed build process issues with clean target
- Updated minimum Go version to 1.24

Code organization:
- Moved Docker configs to build/docker/
- Moved scripts to scripts/
- Moved upstream zstd to contrib/
- Moved test data to test/
- Created comprehensive examples in examples/
- Kept all Go source files in root for package compatibility

Testing enhancements:
- Added Silesia Corpus compression tests with speed measurements
- Created 33 aggressive fuzz tests targeting known vulnerabilities
- Added comprehensive tests for Advanced API
- Added benchmarks comparing raw zstd vs wrapper performance

The wrapper now shows 6-10% performance improvements for compression
while maintaining identical compression ratios.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

panic while decompressing data

2 participants