Improve throughput performance at compression level 9 #1280
Open
uttampawar wants to merge 1 commit intogoogle:masterfrom
Open
Improve throughput performance at compression level 9 #1280uttampawar wants to merge 1 commit intogoogle:masterfrom
uttampawar wants to merge 1 commit intogoogle:masterfrom
Conversation
…MMAP_THRESHOLD value. This patch specifically addresses issue observed at level 9, it doesn't have adverse impact at other levels. Following are the performance numbers for various inputs on latest Xeon server. Runtime 10 seconds Compression level 9 MiB/sec Imp Ratio Input Input-sz compressed-sz Default Opt Opt/Default x 1 5 0.27 3.62 13.41 xyzzy 5 9 0.61 1.65 2.70 xyzzy.compressed 9 13 0.64 2.58 4.03 64x 64 10 8.73 27.13 3.11 alice29.txt 152,089 51,054 11.36 27.73 2.44 alice29.txt.compressed 50,096 50,100 7.13 6.91 0.97 asyoulik.txt 125,179 46,694 9.85 27.16 2.76 asyoulik.txt.compressed 45,687 45,691 6.68 6.61 0.99 backward65536 65,792 19 2,359.99 4371.97 1.85 bb.binast 12,356,697 5,412,654 5.89 5.89 1.00 compressed_file 50,096 50,100 7.13 7.01 0.98 compressed_file.compressed 50,100 50,104 7.13 7.09 0.99 compressed_repeated 144,224 50,443 15.90 168.83 10.62 compressed_repeated.compressed 50,299 50,303 6.90 6.73 0.98 cp1251-utf16le 1,554 660 1.32 38.11 28.87 empty.compressed.17 65,538 17 2,794.64 4558.76 1.63 empty.compressed.18 196,610 22 4,568.71 6190.96 1.36 lcet10.txt 426,754 127,437 17.17 26.17 1.52 lcet10.txt.compressed 124,719 124,724 14.65 14.37 0.98 mapsdatazrh 285,886 166,978 16.09 30.18 1.88 mapsdatazrh.compressed 161,743 161,748 18.12 18 0.99 monkey 843 423 1.50 32.68 21.79 plrabn12.txt 481,861 177,362 14.75 20.08 1.36 plrabn12.txt.compressed 174,771 174,776 19.93 19.78 0.99 quickfox_repeated 176,128 51 2,202.14 5649.04 2.57 random_chunks 2,704 1,906 2.25 44.59 19.82 random_org_10k.bin 10,000 10,004 3.25 125.47 38.61 random_org_10k.bin.compressed 10,004 10,008 3.20 119.01 37.19 ukkonooa 119 71 1.48 13.6 9.19 index.html (from cloudfare) 29,329 7,476 4.76 45.24 9.50
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
Collaborator
|
Wow! Nice investigation. |
Author
|
Thanks @eustas. On the failing test, should I put those changes under "#if linux" macro to clear all failing tests? |
Collaborator
|
I'm still thinking how to make this:
Lets continue with this on Monday. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This patch specifically addresses issue observed at level 9, it doesn't have adverse impact at other levels.
Following are the performance numbers in terms of throughput (bytes processed per sec) for various inputs on latest Xeon server.
Runtime 10 seconds
Compression level 9
Background
Children Self Shared Object Command
The detail stack trace shows following,
Children Self Command Shared Object Symbol
This gave clear indication of major cycles spent due to "page-faults". Collecting "perf stat" showed below stats,
$ perf stat -- ./bench -q 9 -c 1 index.html
Tested file index.html; size: 29329
Threads: 1, alg: brotli, quality 9
Total times compressed: 1716; compressed size: 7476
Compression speed:4.80 MiB
Performance counter stats for './bench -q 9 -c 1 index.html':
With suggested change page faults dropped considerably improving the performance.
Tested file index.html; size: 29329
Threads: 1, alg: brotli, quality 9
Total times compressed: 16109; compressed size: 7476
Compression speed:45.06 MiB
Performance counter stats for './bench -q 9 -c 1 index.html':
And majority cycles are spent in the application instead of kernel managing memory (mmap/munmap).
Children Self Shared Object
Environment:
OS: Ubuntu 24.04.2 LTS
Kernel: 6.8.0-58-generic
GCC: gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
Glibc: ldd (Ubuntu GLIBC 2.39-0ubuntu8.4) 2.39