Skip to content

Various query optimizations and parallel queries in one RD traversal#559

Merged
karasikov merged 106 commits intomasterfrom
mk/brwt
Mar 17, 2026
Merged

Various query optimizations and parallel queries in one RD traversal#559
karasikov merged 106 commits intomasterfrom
mk/brwt

Conversation

@karasikov
Copy link
Copy Markdown
Member

@karasikov karasikov commented Nov 3, 2025

  • enabled parallel batch query of annotations with counts
  • refactoring and other improvements

Status of parallelization for different query setups with batch query:

Annotation type \ Query type matches counts coords
basic na na
with counts ❌ -> ✅ na
with coords ❌ -> ✅ - (always non-batch)

Query Performance Benchmarks

The following benchmarks compare several Metagraph query implementations across
multiple datasets. Results show the mean ± standard deviation across runs,
with the standard deviation also reported as a percentage of the mean.

Code for the benchmark
wait_for_low_cpu() {
    while true; do
        cpu=$(mpstat 1 1 | awk '/Average/ && $2=="all" {print 100-$12}')
        cpu=${cpu%.*}   # optional: drop decimals

        if (( cpu < 5 )); then
            break
        fi

        echo "CPU load ${cpu}% > 5%, sleeping 30s..."
        sleep 30
    done
}

for M in {metagraph_DNA_master,metagraph_DNA_paral4,metagraph_DNA_newBRWT,metagraph_DNA_new10k,metagraph_DNA_new,}; do
for x in {1..5}; do
    wait_for_low_cpu
    /usr/bin/time -v ~/metagraph/metagraph/build/$M query /data/random_stud/queries/100_studies_5k_short.fq -p 30 -v -i /scratch/nvme0/all_sra/data/superkingdom_bacteria/0052/graph.primary.small.dbg -a /scratch/nvme0/all_sra/data/superkingdom_bacteria/0052/annotation.clean.row_diff_brwt.annodbg -v --batch-size 1000000000  2>&1 1>/dev/null | grep -A 3 "Query annotation with" >>logs.txt
    echo -e "\n" >>logs.txt
done

for x in {1..5}; do
    wait_for_low_cpu
    /usr/bin/time -v ~/metagraph/metagraph/build/$M query --query-mode counts /data/random_stud/queries/100_studies_5k_short.fq -p 30 -v -i /scratch/nvme/all_sra/data/kingdom_fungi_RNA/0012/graph.primary.small.dbg -a /scratch/nvme/all_sra/data/kingdom_fungi_RNA/0012/annotation.clean.row_diff_int_brwt.annodbg -v --batch-size 1000000000  2>&1 1>/dev/null | grep -A 3 "Query annotation with" >>logs.txt
    echo -e "\n" >>logs.txt
done

for x in {1..5}; do
    wait_for_low_cpu
    python3 /data/scratch/drop_caches.py /scratch/nvme0/human/graph_merged_complete_k31.primary.small.indexed.dbg
    python3 /data/scratch/drop_caches.py /scratch/nvme7/human/annotation_10.row_diff_flat.annodbg
    /usr/bin/time -v ~/metagraph/metagraph/build/$M query --mmap /data/random_stud/queries/100_studies_50_short.fq -p 30 -v -i /scratch/nvme0/human/graph_merged_complete_k31.primary.small.indexed.dbg -a /scratch/nvme7/human/annotation_10.row_diff_flat.annodbg --min-kmers-fraction-label 0.0 --query-mode matches  2>&1 1>/dev/null | grep -A 3 "Query annotation with" >>logs.txt
    echo -e "\n" >>logs.txt
done

for x in {1..5}; do
    wait_for_low_cpu
    python3 /data/scratch/drop_caches.py /scratch/nvme4/plants/graph_primary.small.indexed.dbg
    python3 /data/scratch/drop_caches.py /scratch/nvme4/plants/*.row_diff_brwt.annodbg
    /usr/bin/time -v ~/metagraph/metagraph/build/$M query --mmap -i /scratch/nvme4/plants/*indexed.dbg -a /scratch/nvme4/plants/*.row_diff_brwt.annodbg /data/scratch/covid.fa -v -p 10 --min-kmers-fraction-label 0.0 --query-mode matches  2>&1 1>/dev/null | grep -A 3 "Query annotation with" >>logs.txt
    echo -e "\n" >>logs.txt
done

for x in {1..5}; do
    wait_for_low_cpu
    python3 /data/scratch/drop_caches.py /scratch/nvme4/plants/graph_primary.small.indexed.dbg
    python3 /data/scratch/drop_caches.py /scratch/nvme4/plants/*.row_diff_brwt.annodbg
    /usr/bin/time -v ~/metagraph/metagraph/build/$M query --mmap -i /scratch/nvme4/plants/*indexed.dbg -a /scratch/nvme4/plants/*.row_diff_brwt.annodbg /data/scratch/plants_query.fa -v -p 10 --min-kmers-fraction-label 0.0 --query-mode matches  2>&1 1>/dev/null | grep -A 3 "Query annotation with" >>logs.txt
    echo -e "\n" >>logs.txt
done
done
  • master the version before (from master)
  • optimized -- same algorithm (in chunks) but with minor optimizations and optimized blocks
  • new -- one RD traversal for the entire query (parallel in groups), then query diff rows (parallel in chunks) in chunks and reconstruct (parallel in groups)
  • newBRWT -- one RD traversal and query BRWT in a single traversal (all parallelization inside)

all_sra / superkingdom_bacteria — 100_studies_5k_short.fq, 122,725,698 bp, in RAM, 30 threads

Method Runs Query annotation (s) Batch query time (s) Throughput (bp/s)
master 10 6.830 ± 0.183 (2.7%) 123.893 ± 4.241 (3.4%) 987,955 ± 33,824 (3.4%)
optimized 10 5.623 ± 0.237 (4.2%) 109.837 ± 0.962 (0.9%) 1,112,827 ± 9,683 (0.9%)
newBRWT 10 7.391 ± 0.133 (1.8%) 112.606 ± 1.089 (1.0%) 1,085,264 ± 11,543 (1.1%)
new 10 6.328 ± 0.136 (2.1%) 114.307 ± 6.699 (5.9%) 1,072,622 ± 59,818 (5.6%)

all_sra / kingdom_fungi_RNA + counts — 100_studies_5k_short.fq, 122,725,698 bp, in RAM, 30 threads

Method Runs Query annotation (s) Batch query time (s) Throughput (bp/s)
master 10 120.464 ± 11.359 (9.4%) 239.072 ± 18.391 (7.7%) 514,390 ± 36,028 (7.0%)
optimized 10 8.295 ± 0.116 (1.4%) 123.481 ± 1.264 (1.0%) 988,375 ± 10,358 (1.0%)
newBRWT 10 9.733 ± 0.350 (3.6%) 124.984 ± 0.847 (0.7%) 976,594 ± 6,613 (0.7%)
new 10 8.436 ± 0.133 (1.6%) 123.267 ± 0.976 (0.8%) 990,197 ± 7,797 (0.8%)

human — 100_studies_50_short.fq, 122,725,698 bp, mmap, 30 threads

Method Runs Query annotation (s) Batch query time (s) Throughput (bp/s)
master 10 42.0 ± 0.8 (2.0%) 63.9 ± 1.1 (1.8%) 19,174 ± 333 (1.7%)
optimized 10 41.4 ± 0.5 (1.1%) 63.0 ± 0.9 (1.4%) 19,458 ± 282 (1.5%)
newBRWT 10 47.0 ± 5.7 (12.2%) 68.8 ± 5.6 (8.1%) 17,919 ± 1,345 (7.5%)
new 10 43.5 ± 1.4 (3.1%) 64.6 ± 1.5 (2.3%) 18,977 ± 433 (2.3%)

plants — covid.fa, 29,903 bp (only 16 k-mer matches), mmap, 10 threads

Method Runs Query annotation (s) Batch query time (s) Throughput (bp/s)
master 10 140.2 ± 2.2 (1.6%) 217.6 ± 3.4 (1.5%) 137.5 ± 2.1 (1.5%)
optimized 10 43.7 ± 2.7 (6.2%) 60.4 ± 2.8 (4.6%) 496.1 ± 21.1 (4.3%)
newBRWT 10 32.9 ± 4.8 (14.6%) 49.5 ± 4.9 (9.8%) 607.9 ± 49.0 (8.1%)
new 10 40.6 ± 3.7 (9.2%) 57.2 ± 3.7 (6.5%) 524.2 ± 31.7 (6.0%)

plants — plants_query.fa, 16,573 bp (16,408 k-mer matches), mmap, 10 threads

Method Runs Query annotation (s) Batch query time (s) Throughput (bp/s)
master 10 255.6 ± 5.0 (2.0%) 317.0 ± 5.1 (1.6%) 52.3 ± 0.9 (1.6%)
optimized 10 150.7 ± 9.5 (6.3%) 165.0 ± 9.5 (5.8%) 100.7 ± 5.6 (5.5%)
newBRWT 10 107.7 ± 4.4 (4.1%) 122.1 ± 4.5 (3.6%) 135.9 ± 4.7 (3.4%)
new 10 104.2 ± 0.7 (0.7%) 118.6 ± 0.8 (0.6%) 139.8 ± 0.9 (0.7%)

plants — 100_studies_20_short.fq, 491,594 bp (153,442 k-mer matches), mmap, 10 threads

Method Runs Query annotation (s) Batch query time (s) Throughput (bp/s)
master 0 ... ... ...
optimized 1 1284.9 ± — 1339.1 ± — 367 ± —
newBRWT 1 332.8 ± — 386.9 ± — 1,270 ± —
new 1 498.9 ± — 553.7 ± — 888 ± —

plants — 100_studies_50_short.fq, 1,226,462 bp (366,859 k-mer matches), mmap, 10 threads

Method Runs Query annotation (s) Batch query time (s) Throughput (bp/s)
master 0 ... ... ...
optimized 1 2363.7 ± — 2433.7 ± — 504 ± —
newBRWT 1 322.8 ± — 395.1 ± — 3,102 ± —
new 1 459.7 ± — 530.0 ± — 2,313 ± —

plants — 100_studies_100_short.fq, 2,454,716 bp (707,334 k-mer matches), mmap, 10 threads

Method Runs Query annotation (s) Batch query time (s) Throughput (bp/s)
master 0 ... ... ...
optimized 1 5,455.6 ± — 5,545.4 ± — 443 ± —
newBRWT 1 318.8 ± — 401.9 ± — 6,101 ± —
new 1 446.0 ± — 551.0 ± — 4,452 ± —

plants — 100_studies_500_short.fq, 12,272,048 bp (3,128,611 k-mer matches), mmap, 10 threads

Method Runs Query annotation (s) Batch query time (s) Throughput (bp/s)
master 0 ... ... ...
optimized 0 ... ... ...
newBRWT 1 355.0 ± — 518.5 ± — 23,581 ± —
new 1 547.7 ± — 734.3 ± — 16,673 ± —

Comment thread metagraph/src/annotation/binary_matrix/multi_brwt/brwt.cpp Outdated
Comment thread metagraph/src/annotation/binary_matrix/multi_brwt/brwt.cpp Outdated
Comment thread metagraph/src/annotation/binary_matrix/multi_brwt/brwt.cpp Outdated
Comment thread metagraph/src/annotation/binary_matrix/multi_brwt/brwt.hpp Outdated
Comment thread metagraph/src/annotation/binary_matrix/multi_brwt/brwt.cpp Outdated
@karasikov karasikov changed the title Parallel Multi-BRWT query with one traversal Various query optimizations and parallel queries in one RD traversal Mar 17, 2026
@karasikov
Copy link
Copy Markdown
Member Author

Choosing new for now, as it's a great trade-off and fast for both in-RAM and mmap queries.

For now, newBRWT (which is especially efficient for queries with memory mapping) will be in another branch.
It should be used to host the largest indexes like SRA-Plants (annotation size is ~1.4 TB).

@karasikov karasikov merged commit e69e128 into master Mar 17, 2026
52 checks passed
@karasikov karasikov deleted the mk/brwt branch March 17, 2026 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants