Skip to content

First pass at adding bamslice#62

Open
mattsoup wants to merge 5 commits into
masterfrom
bamslice
Open

First pass at adding bamslice#62
mattsoup wants to merge 5 commits into
masterfrom
bamslice

Conversation

@mattsoup

Copy link
Copy Markdown
Contributor

No description provided.

Copilot AI review requested due to automatic review settings June 30, 2026 17:53

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces uBAM byte-range chunking via bamslice to parallelize trimming, and then merges per-chunk fastp JSON reports back into a single per-library fastp.json for downstream aggregation. It updates the workflow wiring and nf-test expectations to reflect chunk-named outputs.

Changes:

  • Replace fastp --split_by_lines chunking with bamslice-based uBAM byte-range chunking (params.bamslice_chunk_size) and run fastp per slice.
  • Add a new mergeFastpJson module to merge per-slice fastp JSON outputs into ${library}.fastp.json.
  • Update nf-test assertions and snapshots for new chunk-derived filenames and metrics.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
main.nf Builds uBAM byte-range chunks, runs fastp per chunk, merges fastp JSON, and feeds merged JSON into aggregation.
modules/fastp.nf Switches stdin source from samtools fastq to bamslice with explicit start/end offsets; emits per-chunk JSON/FASTQs.
modules/merge_fastp_json.nf New process to merge per-chunk fastp JSONs into a single per-library JSON for reporting/aggregation.
nextflow.config Replaces fastq_split_lines with bamslice_chunk_size default and comment describing the new behavior.
tests/main.nf.test Updates expected output filenames to match offset-based chunk naming and configures bamslice_chunk_size for tests.
tests/main.nf.test.snap Updates stored snapshots to match new task graph and output content/metrics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread main.nf
Comment on lines +87 to 97
// Split each uBAM into byte-range chunks so trimming runs in parallel per chunk.
chunk_size = params.bamslice_chunk_size as long
bam_chunks = passed_bams.flatMap { library, bam ->
def file_size = bam.size()
def chunks = []
for (long start = 0; start < file_size; start += chunk_size) {
long end = Math.min(start + chunk_size, file_size)
chunks << tuple(library, start, end)
}
chunks
}
Comment thread tests/main.nf.test
Comment on lines 20 to +33
@@ -30,7 +30,7 @@ nextflow_pipeline {
def alignment_metrics = path("${launchDir}/test_output/stats/picard_alignment_metrics/emseq-test1.alignment_summary_metrics.txt").text.tokenize('\n')[5..8]
def methyldackel_extract = path("${launchDir}/test_output/methylDackelExtracts/emseq-test1_CpG.methylKit.gz").md5
def mbias = path("${launchDir}/test_output/methylDackelExtracts/mbias/emseq-test1.combined_mbias.tsv").md5
def nonconverted = path("${launchDir}/test_output/bwameth_align/0001.emseq-test1.nonconverted_counts.tsv").text.tokenize('\n')
def nonconverted = path("${launchDir}/test_output/bwameth_align/emseq-test1_0_250000.nonconverted_counts.tsv").text.tokenize('\n')
Comment thread tests/main.nf.test
Comment on lines 87 to +99
@@ -96,7 +96,7 @@ nextflow_pipeline {
def alignment_metrics = path("${launchDir}/test_output/stats/picard_alignment_metrics/emseq-test1.alignment_summary_metrics.txt").text.tokenize('\n')[5..8]
def methyldackel_extract = path("${launchDir}/test_output/methylDackelExtracts/emseq-test1_CpG.methylKit.gz").md5
def mbias = path("${launchDir}/test_output/methylDackelExtracts/mbias/emseq-test1.combined_mbias.tsv").md5
def nonconverted = path("${launchDir}/test_output/bwameth_align/0001.emseq-test1.nonconverted_counts.tsv").text.tokenize('\n')
def nonconverted = path("${launchDir}/test_output/bwameth_align/emseq-test1_0_250000.nonconverted_counts.tsv").text.tokenize('\n')
@mattsoup mattsoup requested review from bwlang and lnblum July 2, 2026 17:06

@lnblum lnblum left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me. I wonder if it's worth giving fastp more threads? If I recall, 1 thread was required to avoid a bug for the --fastq_split_lines, which is not going to be used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants