Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR introduces uBAM byte-range chunking via bamslice to parallelize trimming, and then merges per-chunk fastp JSON reports back into a single per-library fastp.json for downstream aggregation. It updates the workflow wiring and nf-test expectations to reflect chunk-named outputs.
Changes:
- Replace
fastp --split_by_lineschunking withbamslice-based uBAM byte-range chunking (params.bamslice_chunk_size) and runfastpper slice. - Add a new
mergeFastpJsonmodule to merge per-slice fastp JSON outputs into${library}.fastp.json. - Update nf-test assertions and snapshots for new chunk-derived filenames and metrics.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
main.nf |
Builds uBAM byte-range chunks, runs fastp per chunk, merges fastp JSON, and feeds merged JSON into aggregation. |
modules/fastp.nf |
Switches stdin source from samtools fastq to bamslice with explicit start/end offsets; emits per-chunk JSON/FASTQs. |
modules/merge_fastp_json.nf |
New process to merge per-chunk fastp JSONs into a single per-library JSON for reporting/aggregation. |
nextflow.config |
Replaces fastq_split_lines with bamslice_chunk_size default and comment describing the new behavior. |
tests/main.nf.test |
Updates expected output filenames to match offset-based chunk naming and configures bamslice_chunk_size for tests. |
tests/main.nf.test.snap |
Updates stored snapshots to match new task graph and output content/metrics. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+87
to
97
| // Split each uBAM into byte-range chunks so trimming runs in parallel per chunk. | ||
| chunk_size = params.bamslice_chunk_size as long | ||
| bam_chunks = passed_bams.flatMap { library, bam -> | ||
| def file_size = bam.size() | ||
| def chunks = [] | ||
| for (long start = 0; start < file_size; start += chunk_size) { | ||
| long end = Math.min(start + chunk_size, file_size) | ||
| chunks << tuple(library, start, end) | ||
| } | ||
| chunks | ||
| } |
Comment on lines
20
to
+33
| @@ -30,7 +30,7 @@ nextflow_pipeline { | |||
| def alignment_metrics = path("${launchDir}/test_output/stats/picard_alignment_metrics/emseq-test1.alignment_summary_metrics.txt").text.tokenize('\n')[5..8] | |||
| def methyldackel_extract = path("${launchDir}/test_output/methylDackelExtracts/emseq-test1_CpG.methylKit.gz").md5 | |||
| def mbias = path("${launchDir}/test_output/methylDackelExtracts/mbias/emseq-test1.combined_mbias.tsv").md5 | |||
| def nonconverted = path("${launchDir}/test_output/bwameth_align/0001.emseq-test1.nonconverted_counts.tsv").text.tokenize('\n') | |||
| def nonconverted = path("${launchDir}/test_output/bwameth_align/emseq-test1_0_250000.nonconverted_counts.tsv").text.tokenize('\n') | |||
Comment on lines
87
to
+99
| @@ -96,7 +96,7 @@ nextflow_pipeline { | |||
| def alignment_metrics = path("${launchDir}/test_output/stats/picard_alignment_metrics/emseq-test1.alignment_summary_metrics.txt").text.tokenize('\n')[5..8] | |||
| def methyldackel_extract = path("${launchDir}/test_output/methylDackelExtracts/emseq-test1_CpG.methylKit.gz").md5 | |||
| def mbias = path("${launchDir}/test_output/methylDackelExtracts/mbias/emseq-test1.combined_mbias.tsv").md5 | |||
| def nonconverted = path("${launchDir}/test_output/bwameth_align/0001.emseq-test1.nonconverted_counts.tsv").text.tokenize('\n') | |||
| def nonconverted = path("${launchDir}/test_output/bwameth_align/emseq-test1_0_250000.nonconverted_counts.tsv").text.tokenize('\n') | |||
lnblum
approved these changes
Jul 2, 2026
lnblum
left a comment
Contributor
There was a problem hiding this comment.
looks good to me. I wonder if it's worth giving fastp more threads? If I recall, 1 thread was required to avoid a bug for the --fastq_split_lines, which is not going to be used.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.