nf-core/rnavar is a bioinformatics pipeline for RNA variant calling analysis following GATK4 best practices.
- Merge re-sequenced FastQ files (
cat) - Read QC (
FastQC) - (Optionally) Extract UMIs from FASTQ reads (
UMI-tools) - (Optionally) HLATyping from FASTQ reads (
Seq2HLA) - Align reads to reference genome (
STAR) - Sort and index alignments (
SAMtools) - Duplicate read marking (
Picard MarkDuplicates) - Scatter one interval-list into many interval-files (
GATK4 IntervalListTools) - Splits reads that contain Ns in their cigar string (
GATK4 SplitNCigarReads) - Estimate and correct systematic bias using base quality score recalibration (
GATK4 BaseRecalibrator,GATK4 ApplyBQSR) - Convert a BED file to a Picard Interval List (
GATK4 BedToIntervalList) - Call SNPs and indels (
GATK4 HaplotypeCaller) - Merge multiple VCF files into one VCF (
GATK4 MergeVCFs) - Index the VCF (
Tabix) - Filter variant calls based on certain criteria (
GATK4 VariantFiltration) - Annotate variants (
BCFtools Annotate,snpEff, Ensembl VEP) - Present QC for raw read, alignment, gene biotype, sample similarity, and strand-specificity checks (
MultiQC,R)
| Tool | Version |
|---|---|
| BCFtools | 1.21 |
| BEDtools | 2.31.1 |
| Ensembl VEP | 114.2 |
| FastQC | 0.12.1 |
| GATK | 4.6.1.0 |
| mosdepth | 0.3.10 |
| MultiQC | 1.29 |
| Picard | 3.3.0 |
| Samtools | 1.21 |
| Seq2HLA | 2.3 |
| SnpEff | 5.1 |
| STAR | 2.7.11b |
| Tabix | 1.20 |
| UMI-tools | 1.1.5 |
Note
If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.
First, prepare a samplesheet with your input data that looks as follows:
samplesheet.csv:
sample,fastq_1,fastq_2
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
Each row represents a fastq file (single-end) or a pair of fastq files (paired end).
Now, you can run the pipeline using:
nextflow run nf-core/rnavar -profile <docker/singularity/podman/shifter/charliecloud/conda/institute> --input samplesheet.csv --outdir <OUTDIR> --genome GRCh38Warning
Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.
For more details and further functionality, please refer to the usage documentation and the parameter documentation.
To see the results of an example test run with a full size dataset refer to the results tab on the nf-core website pipeline page. For more details about the output files and reports, please refer to the output documentation.
rnavar was originally written by Praveen Raj and Maxime U Garcia at The Swedish Childhood Tumor Biobank (Barntumörbanken), Karolinska Institutet. Nicolas Vannieuwkerke at CMGG later joined and helped with further development (1.1.0 and forward).
Maintenance is now lead by Maxime U Garcia (before at Seqera, now at NGI)
Main developers:
We thank the following people for their extensive assistance in the development of this pipeline:
If you would like to contribute to this pipeline, please see the contributing guidelines.
For further information or help, don't hesitate to get in touch on the Slack #rnavar channel (you can join with this invite).
If you use nf-core/rnavar for your analysis, please cite it using the following doi: 10.5281/zenodo.6669636
An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.
You can cite the nf-core publication as follows:
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.