Kun-peng is built around a simple idea: store massive reference libraries once, then classify samples by streaming only the hash shards that matter. Each read is reduced to spaced minimizers (long k-mers combined with shorter minimizer seeds and user-defined spacing) and matched against ordered hash shards that default to ~4 GB but can be resized via --hash-capacity, so peak RAM remains in the single-digit GB range even when the database scales to multi-terabyte collections covering hundreds of thousands of genomes. The engine inherits Kraken’s minimizer taxonomy logic and then tightens disk layout, memory scheduling, and streaming behavior to make those ideas practical for much larger pan-domain databases.
- Block-ordered hash layout: every
hash_*.k2dshard is ~4 GB and tracked byhash_config.k2d. Build steps can fan out across threads, while classification maps minimizers to specific shards and loads only what is needed. - KLMT minimizer encoding: defaults such as
-k 35 -l 31 --minimizer-spaces 7capture long-range context with far fewer entries than raw k-mers, keeping disk/RAM budgets predictable without losing specificity. - Composable subcommands: the build side (
merge-fna/add-library → estimate → chunk → build-db) and classify side (splitr → annotate → resolve) are exposed as standalone steps, simplifying debugging, benchmarking, or swapping components. - Drop-in outputs: Kun-peng writes Kraken-compatible tables and
kreport2summaries so existing visualization or QC workflows require no changes.
- Gather reference genomes –
merge-fnaoradd-librarynormalizes NCBI downloads and user FASTA files intolibrary/*.fna, appending taxids toseqid2taxid.map. - Estimate capacity –
estimatescans the KLMT minimizer stream to choose a safe hash-slot count and load factor, preventing oversized shards or hash collisions. - Chunk the minimizer stream –
chunkwrites intermediate files sized for the target shard, keeping I/O parallel and deterministic. - Emit reusable artifacts –
build-dbconverts each chunk intohash_*.k2d, plus shared metadata (hash_config.k2d,taxo.k2d,opts.k2d). The resulting folder is a portable Kun-peng database.
splitrstreaming
Reads (FA/FASTQ, optionally.gz) are broken into manageable chunks, and their KLMT minimizers land intemp_chunk/so oversized datasets stay streamable.annotateon-demand lookups
Each chunk triggers loading only the hash shards referenced by its minimizers. Taxid hits are accumulated and optionally batched across threads to minimize disk churn.resolvetaxonomy reasoning
The hit vectors move through the taxonomy tree (taxo.k2d) to compute LCAs plus confidence filtering, respecting minimum hit groups when required.- Report generation
Kun-peng prints Kraken-styleoutput_*.txtfiles and*.kreport2, which can be fed directly into downstream comparison or visualization tools.
Pick the path that matches your situation. Each path links to a detailed demo.
-
Option A — Build from downloads (one command), then classify
kun_peng build --download-dir data/ --db test_database --hash-capacity 1G- Adjust
--hash-capacityto set the hash shard size (defaults to1G, which creates ~4 GiB hash files).
- Adjust
mkdir -p temp_chunk test_out && kun_peng classify --db test_database --chunk-dir temp_chunk --output-dir test_out data/COVID_19.fa
- Details: docs/build-db-demo.md and docs/classify-demo.md
-
Option B — You already have or curate a library
- Prepare library:
kun_peng merge-fna --download-dir data/ --db test_databaseorkun_peng add-library --db test_database -i /path/to/fastas - Build-only:
kun_peng build-db --db test_database --hash-capacity 1G- Use the same
--hash-capacityguidance as Option A to control shard sizes.
- Use the same
- Classify:
mkdir -p temp_chunk test_out && kun_peng classify --db test_database --chunk-dir temp_chunk --output-dir test_out <reads>
- Details: docs/build-db-demo.md and docs/classify-demo.md
- Prepare library:
-
Option C — You have a Kraken 2 database
- Convert:
kun_peng hashshard --db /path/to/kraken_db --hash-capacity 1G - Classify:
mkdir -p temp_chunk test_out && kun_peng classify --db /path/to/kraken_db --chunk-dir temp_chunk --output-dir test_out <reads>or direct mode:kun_peng direct --db /path/to/kraken_db <reads>
- Details: docs/hashshard-demo.md and docs/classify-demo.md
- Convert:
For more step-by-step guidance, see:
- Detailed Database Build Demo: docs/build-db-demo.md
- Detailed Classification Demo: docs/classify-demo.md
- Kraken 2 Conversion Demo: docs/hashshard-demo.md
Follow these steps to install Kun-peng and run the examples.
If you only need commands to run today, start with Quick Start above. The sections below cover installation methods and reference help for each subcommand.
- Use a clean
--chunk-dirforclassify. The directory must not containsample_*.k2,sample_id*.map, orsample_*.bin, otherwise the command will error. - After adding FASTA with
add-library, always runbuild-dbto rebuild hash tables. Stalehash_*.k2dwill yield incorrect results. - Direct mode needs RAM ≥ sum of
hash_*.k2d. Runbash cal_memory.sh <db>to estimate. If insufficient, use the integratedclassifyworkflow instead. hashshardaborts ifhash_config.k2dalready exists in the target directory. Use a fresh directory or remove/backup the existing file.- Choosing
--hash-capacity(hashshard): shard file size ≈ capacity × 4 bytes. Example:1Gcapacity → ~4 GiB per shard. More, smaller shards can improve I/O parallelism with modest file count overhead. - Keep
--load-factorreasonable (default 0.7). Very high values may hurt build success or classification speed; very low values waste disk/memory.
If you prefer not to build from source, you can download the pre-built binaries for your platform from the GitHub releases page.
For Linux users (CentOS 7 compatible):
# Replace X.Y.Z with the latest version number
VERSION=vX.Y.Z
mkdir kun_peng_$VERSION
wget https://github.com/eric9n/Kun-peng/releases/download/$VERSION/kun_peng-$VERSION-centos7
# For linux x86_64
# wget https://github.com/eric9n/Kun-peng/releases/download/$VERSION/kun_peng-$VERSION-x86_64-unknown-linux-gnu
mv kun_peng-$VERSION-centos7 kun_peng_$VERSION/kun_peng
chmod +x kun_peng_$VERSION/kun_peng
# Add to PATH
echo "export PATH=\$PATH:$PWD/kun_peng_$VERSION" >> ~/.bashrc
source ~/.bashrcFor macOS users:
brew install eric9n/tap/kun_peng# Replace X.Y.Z with the latest version number
VERSION=vX.Y.Z
mkdir kun_peng_$VERSION
# For Intel Macs
wget https://github.com/eric9n/Kun-peng/releases/download/$VERSION/kun_peng-$VERSION-x86_64-apple-darwin
mv kun_peng-$VERSION-x86_64-apple-darwin kun_peng_$VERSION/kun_peng
# For Apple Silicon Macs
# wget https://github.com/eric9n/Kun-peng/releases/download/$VERSION/kun_peng-$VERSION-aarch64-apple-darwin
# mv kun_peng-$VERSION-aarch64-apple-darwin kun_peng_$VERSION/kun_peng
chmod +x kun_peng_$VERSION/kun_peng
# Add to PATH
echo "export PATH=\$PATH:$PWD/kun_peng_$VERSION" >> ~/.zshrc # or ~/.bash_profile for Bash
source ~/.zshrc # or source ~/.bash_profile for BashFor Windows users:
# Replace X.Y.Z with the latest version number
$VERSION = "vX.Y.Z"
New-Item -ItemType Directory -Force -Path kun_peng_$VERSION
Invoke-WebRequest -Uri "https://github.com/eric9n/Kun-peng/releases/download/$VERSION/kun_peng-$VERSION-x86_64-pc-windows-msvc.exe" -OutFile "kun_peng_$VERSION\kun_peng.exe"
# Add to PATH
$env:Path += ";$PWD\kun_peng_$VERSION"
[Environment]::SetEnvironmentVariable("Path", $env:Path, [EnvironmentVariableTarget]::User)After installation, you can verify the installation by running:
kun_peng --versionWe will use a very small virus database on the GitHub homepage as an example:
- clone the repository
git clone https://github.com/eric9n/Kun-peng.git
cd Kun-peng- build database
kun_peng build --download-dir data/ --db test_database --hash-capacity 1GThe --hash-capacity flag controls the number of hash slots (and therefore shard sizes); leave it at 1G for ~4 GiB hash files or pick a higher/lower value to fit your deployment.
merge fna start...
merge fna took: 29.998258ms
estimate start...
estimate count: 14080, required capacity: 31818.0, Estimated hash table requirement: 124.29KB
convert fna file "test_database/library.fna"
process chunk file 1/1: duration: 29.326627ms
build k2 db took: 30.847894ms
- classify
# temp_chunk is used to store intermediate files
mkdir temp_chunk
# test_out is used to store output files
mkdir test_out
kun_peng classify --db test_database --chunk-dir temp_chunk --output-dir test_out data/COVID_19.faSet up the repository's Git hooks so pushing requires an up-to-date Cargo.lock:
git config core.hooksPath .githooksThe provided pre-push hook runs cargo check --locked and aborts the push if the lock file is out of sync with Cargo.toml.
hash_config HashConfig { value_mask: 31, value_bits: 5, capacity: 31818, size: 13051, hash_capacity: 1073741824 }
splitr start...
splitr took: 18.212452ms
annotate start...
chunk_file "temp_chunk/sample_1.k2"
load table took: 548.911µs
annotate took: 12.006329ms
resolve start...
resolve took: 39.571515ms
Classify took: 92.519365ms
- Rust: This project requires the Rust programming environment if you plan to build from source.
First, clone this repository to your local machine:
git clone https://github.com/eric9n/Kun-peng.git
cd kun_pengEnsure that both projects are built. You can do this by running the following command from the root of the workspace:
cargo build --releaseThis will build the kr2r and ncbi project in release mode.
Next, run the example script that demonstrates how to use the kun_peng binary. Execute the following command from the root of the workspace:
cargo run --release --example build_and_classifyThis will run the build_and_classify.rs example located in the kr2r project's examples directory.
Example Output You should see output similar to the following:
Executing command: /path/to/workspace/target/release/kun_peng build --download-dir data/ --db test_database --hash-capacity 1G
kun_peng build output: [build output here]
kun_peng build error: [any build errors here]
Executing command: /path/to/workspace/target/release/kun_peng direct --db test_database data/COVID_19.fa
kun_peng direct output: [direct output here]
kun_peng direct error: [any direct errors here]This output confirms that the kun_peng commands were executed successfully and the files were processed as expected.
For detailed information and usage instructions for the ncbi_dl tool, please refer to the ncbi_dl repository.
The ncbi_dl tool is used to download resources from the NCBI website, including taxonomy files and genome data. It provides a convenient way to obtain the necessary data for building Kun-peng databases.
To download genome databases using ncbi_dl, you can use the genomes (or gen) command. Here's a basic example:
ncbi_dl -d /path/to/download/directory gen -g bacteriaThis command will download bacterial genomes to the specified directory. You can replace bacteria with other genome groups like archaea, fungi, protozoa, or viral depending on your needs.
Some key options for the genomes command include:
-g, --groups <GROUPS>: Specify which genome groups to download (e.g., bacteria, archaea, viral)-f, --file-types <FILE_TYPES>: Choose which file types to download (default is genomic.fna.gz)-l, --assembly-level <ASSEMBLY_LEVEL>: Set the assembly level (e.g., complete, chromosome, scaffold, contig)
For a full list of options and more detailed usage instructions, please refer to the ncbi_dl repository documentation.
For installation, additional usage examples, and more detailed documentation, please visit the ncbi_dl repository linked above.
Usage: kun_peng <COMMAND>
Commands:
estimate estimate capacity
build build `k2d` files
build-db Run the final database construction steps (estimate, chunk, build)
add-library Add new FASTA files to an existing Kun-Peng database library
hashshard Convert Kraken2 database files to Kun-peng database format for efficient processing and analysis.
splitr Split fast(q/a) file into ranges
annotate annotate a set of sequences
resolve resolve taxonomy tree
classify Integrates 'splitr', 'annotate', and 'resolve' into a unified workflow for sequence classification. classify a set of sequences
direct Directly load all hash tables for classification annotation
merge-fna A tool for processing genomic files
help Print this message or the help of the given subcommand(s)
Options:
-h, --help Print help
-V, --version Print versionBuild the kun_peng database like Kraken2, specifying the directory for the data files downloaded from NCBI, as well as the database directory.
./target/release/kun_peng build -h
build database
Usage: kun_peng build [OPTIONS] --download-dir <DOWNLOAD_DIR> --db <DATABASE>
Options:
-d, --download-dir <DOWNLOAD_DIR>
Directory to store downloaded files
--db <DATABASE>
ncbi library fna database directory
-k, --k-mer <K_MER>
Set length of k-mers, k must be positive integer, k=35, k cannot be less than l [default: 35]
-l, --l-mer <L_MER>
Set length of minimizers, 1 <= l <= 31 [default: 31]
--minimizer-spaces <MINIMIZER_SPACES>
Number of characters in minimizer that are ignored in comparisons [default: 7]
-T, --toggle-mask <TOGGLE_MASK>
Minimizer ordering toggle mask [default: 16392584516609989165]
--min-clear-hash-value <MIN_CLEAR_HASH_VALUE>
-r, --requested-bits-for-taxid <REQUESTED_BITS_FOR_TAXID>
Bit storage requested for taxid 0 <= r < 31 [default: 0]
-p, --threads <THREADS>
Number of threads [default: 10]
--cache
estimate capacity from cache if exists
--max-n <MAX_N>
Set maximum qualifying hash code [default: 4]
--load-factor <LOAD_FACTOR>
Proportion of the hash table to be populated (build task only; def: 0.7, must be between 0 and 1) [default: 0.7]
--hash-capacity <HASH_CAPACITY>
Specifies the hash file capacity.
Acceptable formats include numeric values followed by 'K', 'M', or 'G' (e.g., '1.5G', '250M', '1024K').
Note: The specified capacity affects the index size, with a factor of 4 applied.
For example, specifying '1G' results in an index size of '4G'.
Default: 1G (capacity 1G = file size 4G) [default: 1G]
-h, --help
Print help
-V, --version
Print versionBuild from an existing library/ directory. This runs the final steps only: estimate capacity (unless -c is provided), chunk, and build hash tables.
./target/release/kun_peng build-db -h
Run the final database construction steps (estimate, chunk, build)
Usage: kun_peng build-db [OPTIONS] --db <DATABASE>
Options:
--db <DATABASE> ncbi library fna database directory
-k, --k-mer <K_MER> Set length of k-mers, k must be positive integer, k cannot be less than l [default: 35]
-l, --l-mer <L_MER> Set length of minimizers, 1 <= l <= 31 [default: 31]
--minimizer-spaces <MINIMIZER_SPACES> Number of characters in minimizer that are ignored in comparisons [default: 7]
-T, --toggle-mask <TOGGLE_MASK> Minimizer ordering toggle mask [default: 16392584516609989165]
--min-clear-hash-value <MIN_CLEAR_HASH_VALUE>
-r, --requested-bits-for-taxid <REQUESTED_BITS_FOR_TAXID>
Bit storage requested for taxid 0 <= r < 31 [default: 0]
-p, --threads <THREADS> Number of threads [default: 8]
-c, --required-capacity <EXACT_SLOT_COUNT> Manually set the precise hash table capacity (number of slots)
--cache Estimate capacity from cache if exists
--max-n <MAX_N> Set maximum qualifying hash code [default: 4]
--load-factor <LOAD_FACTOR> Proportion of the hash table to be populated [default: 0.7]
--hash-capacity <HASH_CAPACITY> Specifies the hash file capacity.
Acceptable formats include numeric values followed by 'K', 'M', or 'G' (e.g., '1.5G', '250M', '1024K').
Note: The specified capacity affects the index size, with a factor of 4 applied.
For example, specifying '1G' results in an index size of '4G'.
Default: 1G (capacity 1G = file size 4G) [default: 1G]
-h, --help Print help
-V, --version Print versionNote: If you already have a populated library/ under your database directory (e.g., after running merge-fna or manually preparing it), prefer:
kun_peng build-db --db test_database --hash-capacity 1GExample: Prepare library, then build-db
# 1) Merge downloaded genomes into library files
kun_peng merge-fna --download-dir data/ --db test_database --max-file-size 2G
# 2) Build only the final database artifacts (estimate, chunk, build)
kun_peng build-db --db test_database --hash-capacity 1GAdd new FASTA files (or directories of FASTA/FASTA.GZ) into a database directory (empty or existing). It will create/extend the library/*.fna shards and append entries to seqid2taxid.map. After populating the library, run build-db to (re)generate the hash tables.
./target/release/kun_peng add-library -h
Add new FASTA files to an existing Kun-Peng database library
Usage: kun_peng add-library [OPTIONS] --db <DATABASE> --input-library <INPUT_LIBRARY>...
Options:
--db <DATABASE> Main database directory (must contain existing library/ and taxonomy/ dirs)
-i, --input-library <INPUT_LIBRARY>... Input files or directories (containing .fa, .fna, .fasta, .fsa, *.gz files)
--max-file-size <MAX_FILE_SIZE> library fna temp file max size [default: 2G]
-h, --help Print help
-V, --version Print versionQuick example:
# Add a folder of FASTA files into an existing database
kun_peng add-library --db test_database -i /path/to/new_fastas/
# Rebuild index after adding
kun_peng build-db --db test_database --hash-capacity 1GConverts an existing Kraken 2 database (containing hash.k2d, opts.k2d, and taxo.k2d) into Kun-peng’s sharded hash format. This enables Kun-peng’s memory- and I/O-efficient classification workflows without rebuilding from source FASTA.
See step-by-step demo: docs/hashshard-demo.md
./target/release/kun_peng hashshard -h
Convert Kraken2 database files to Kun-peng database format for efficient processing and analysis.
Usage: kun_peng hashshard [OPTIONS] --db <DATABASE>
Options:
--db <DATABASE> The database directory for the Kraken 2 index. contains index files(hash.k2d opts.k2d taxo.k2d)
--hash-capacity <HASH_CAPACITY> Specifies the hash file capacity.
Acceptable formats include numeric values followed by 'K', 'M', or 'G' (e.g., '1.5G', '250M', '1024K').
Note: The specified capacity affects the index size, with a factor of 4 applied.
For example, specifying '1G' results in an index size of '4G'.
Default: 1G (capacity 1G = file size 4G) [default: 1G]
-h, --help Print help
-V, --version Print version
The classification process is divided into three modes:
- Direct Processing Mode:
- Description: In this mode, all database files are loaded simultaneously, which requires a significant amount of memory. Before running this mode, you need to check the total size of hash_*.k2d files in the database directory using the provided script. Ensure that your available memory meets or exceeds this size.
bash cal_memory.sh $database_dir- Characteristics:
- High memory requirements
- Fast performance
Command Help
./target/release/kun_peng direct -h
Directly load all hash tables for classification annotation
Usage: kun_peng direct [OPTIONS] --db <DATABASE> [INPUT_FILES]...
Arguments:
[INPUT_FILES]... A list of input file paths (FASTA/FASTQ) to be processed by the classify program. Supports fasta or fastq format files (e.g., .fasta, .fastq) and gzip compressed files (e.g., .fasta.gz, .fastq.gz)
Options:
--db <DATABASE>
database hash chunk directory and other files
-P, --paired-end-processing
Enable paired-end processing
-Q, --minimum-quality-score <MINIMUM_QUALITY_SCORE>
Minimum quality score for FASTQ data [default: 0]
-T, --confidence-threshold <CONFIDENCE_THRESHOLD>
Confidence score threshold [default: 0]
-K, --report-kmer-data
In comb. w/ -R, provide minimizer information in report
-z, --report-zero-counts
In comb. w/ -R, report taxa w/ 0 count
-g, --minimum-hit-groups <MINIMUM_HIT_GROUPS>
The minimum number of hit groups needed for a call [default: 2]
-p, --num-threads <NUM_THREADS>
The number of threads to use [default: 10]
--output-dir <KRAKEN_OUTPUT_DIR>
File path for outputting normal Kraken output
-h, --help
Print help (see more with '--help')
-V, --version
Print version- Chunk Processing Mode:
- Description: This mode processes the sample data in chunks, loading only a small portion of the database files at a time. This reduces the memory requirements, needing a minimum of 4GB of memory plus the size of one pair of sample files.
- Characteristics:
- Low memory consumption
- Slower performance compared to Direct Processing Mode
Command Help
./target/release/kun_peng classify -h
Integrates 'splitr', 'annotate', and 'resolve' into a unified workflow for sequence classification. classify a set of sequences
Usage: kun_peng classify [OPTIONS] --db <DATABASE> --chunk-dir <CHUNK_DIR> [INPUT_FILES]...
Arguments:
[INPUT_FILES]... A list of input file paths (FASTA/FASTQ) to be processed by the classify program. Supports fasta or fastq format files (e.g., .fasta, .fastq) and gzip compressed files (e.g., .fasta.gz, .fastq.gz).
Can also be a single .txt file containing a list of input file paths, one per line.
Options:
--db <DATABASE>
--chunk-dir <CHUNK_DIR>
chunk directory
--output-dir <KRAKEN_OUTPUT_DIR>
File path for outputting normal Kraken output
-P, --paired-end-processing
Enable paired-end processing
-Q, --minimum-quality-score <MINIMUM_QUALITY_SCORE>
Minimum quality score for FASTQ data [default: 0]
-p, --num-threads <NUM_THREADS>
The number of threads to use [default: 10]
--buffer-size <BUFFER_SIZE>
[default: 16777216]
--batch-size <BATCH_SIZE>
The size of each batch for processing taxid match results, used to control memory usage
[default: 16]
-T, --confidence-threshold <CONFIDENCE_THRESHOLD>
Confidence score threshold [default: 0]
-g, --minimum-hit-groups <MINIMUM_HIT_GROUPS>
The minimum number of hit groups needed for a call [default: 2]
--kraken-db-type
Enables use of a Kraken 2 compatible shared database
-K, --report-kmer-data
In comb. w/ -R, provide minimizer information in report
-z, --report-zero-counts
In comb. w/ -R, report taxa w/ 0 count
-h, --help
Print help (see more with '--help')
-V, --version
Print version- Step-by-Step Processing Mode:
- Description: This mode breaks down the chunk processing mode into individual steps, providing greater flexibility in managing the entire classification process.
- Characteristics:
- Flexible processing steps
- Similar memory consumption to Chunk Processing Mode
- Performance varies based on execution steps
- test_out/output_1.txt:
Standard Kraken Output Format:
- "C"/"U": a one letter code indicating that the sequence was either classified or unclassified.
- The sequence ID, obtained from the FASTA/FASTQ header.
- The taxonomy ID Kraken 2 used to label the sequence; this is 0 if the sequence is unclassified.
- The length of the sequence in bp. In the case of paired read data, this will be a string containing the lengths of the two sequences in bp, separated by a pipe character, e.g. "98|94".
- A space-delimited list indicating the LCA mapping of each k-mer in the sequence(s). For example, "562:13 561:4 A:31 0:1 562:3" would indicate that:
- the first 13 k-mers mapped to taxonomy ID #562
- the next 4 k-mers mapped to taxonomy ID #561
- the next 31 k-mers contained an ambiguous nucleotide
- the next k-mer was not in the database
- the last 3 k-mers mapped to taxonomy ID #562
Note that paired read data will contain a "
|:|" token in this list to indicate the end of one read and the beginning of another.
- test_out/output_1.kreport2:
100.00 1 0 R 1 root
100.00 1 0 D 10239 Viruses
100.00 1 0 D1 2559587 Riboviria
100.00 1 0 O 76804 Nidovirales
100.00 1 0 O1 2499399 Cornidovirineae
100.00 1 0 F 11118 Coronaviridae
100.00 1 0 F1 2501931 Orthocoronavirinae
100.00 1 0 G 694002 Betacoronavirus
100.00 1 0 G1 2509511 Sarbecovirus
100.00 1 0 S 694009 Severe acute respiratory syndrome-related coronavirus
100.00 1 1 S1 2697049 Severe acute respiratory syndrome coronavirus 2
Sample Report Output Formats:
- Percentage of fragments covered by the clade rooted at this taxon
- Number of fragments covered by the clade rooted at this taxon
- Number of fragments assigned directly to this taxon
- A rank code, indicating (U)nclassified, (R)oot, (D)omain, (K)ingdom, (P)hylum, (C)lass, (O)rder, (F)amily, (G)enus, or (S)pecies. Taxa that are not at any of these 10 ranks have a rank code that is formed by using the rank code of the closest ancestor rank with a number indicating the distance from that rank. E.g., "G2" is a rank code indicating a taxon is between genus and species and the grandparent taxon is at the genus rank.
- NCBI taxonomic ID number
- Indented scientific name
@article{Chen2024KunPeng,
author = {Chen, Qiong and Zhang, Boliang and Peng, Chen and Huang, Jiajun and Shen, Xiaotao and Jiang, Chao},
title = {Kun-peng: an ultra-memory-efficient, fast, and accurate pan-domain taxonomic classifier for all},
journal = {bioRxiv},
year = {2024},
doi = {10.1101/2024.12.19.629356},
url = {https://www.biorxiv.org/content/10.1101/2024.12.19.629356v1},
note = {preprint}
}