SCARLET provides a NEXTFLOW version of the R.O.B.I.N. 'live' tumour classification tool.
To generate sequence data suitable for SCARLET analysis, we recommend either runnig the R.O.B.I.N. 'live' tool, or using Readfish with the file of targets at bin/NPHD_panel_hg38_clean.bed Either of these options will produce a data set suitable for analysis.
You need to provide:
- a sorted BAM file with methylation probabilities that has been aligned to GRCh38.
- the associated .bai index.
- the GRh38 genome reference sequence and annotation set (GTF).
git
docker
nextflow
git clone https://github.com/graemefox/SCARLET.git
wget https://gitlab.com/euskirchen-lab/crossNN/-/raw/master/models/Capper_et_al_NN.pkl?inline=false -O SCARLET/bin/Capper_et_al_NN.pkl
docker pull graefox/scarlet:latest
nextflow pull epi2me-labs/wf-human-variation
## define sample name, ID and output directory, input BAM and reference genome:
SAMPLE=sample_01
OUTDIR=${SAMPLE}_output
BAM=my_data.bam
REFERENCE=my_reference.fa.gz
ANNOTATIONS=my_annotation_set.gtf
## run the pipeline
nextflow run SCARLET/main.nf \
-with-docker graefox/scarlet:latest \
--sample $SAMPLE \
--bam $BAM \
--outdir $OUTDIR \
--reference $REFERENCE \
--annotations $ANNOTATIONS \
--nanoplot \
--sturgeon --rapidcns2 --nanodx
These a have default values specified in the nextflow.config file, but you may override them on the CLI.
--threads 16 (CPUs to use [default: 64])
--bam_min_coverage (minimum coverage required to run the epi2melabs/wf-human-variation stages [ default: 5])
--minimum_mgmt_cov (minimum avg coverage at the mgmt promoter. Coverage must be greater than this to run the analysis of mgmt methylation)
--rapidcns2 (the nextflow will run the rapidCNS2 (https://github.com/areebapatel/Rapid-CNS2) classifier if the --rapidcns2 flag is passed [Defualt behaviour is to NOT run rapidCNS2])
--sturgeon (the nextflow will run the sturgeon (https://github.com/marcpaga/sturgeon) classifier if the --sturgeon flag is passed [Defualt behaviour is to NOT run sturgeon])
--nanodx (the nextflow will run the nanoDx (https://gitlab.com/pesk/nanoDx) classifier if the --nanodx flag is passed [Defualt behaviour is to NOT run nanoDx])
--nanoplot (nextflow will ALSO run NanoPlot to generate a QC report[ Default behaviour is to NOT run nanoplot])
Add -process.executor='slurm' to your nextflow command, then run as normal. You do not need to submit a script with SBATCH, just run the nextflow command as normal and nextflow knows
to submit each process into SLURM.
If the run seems to hang forever at the cnvpytor step, it may be that you have not indexed your input bam. This is also just quite a long process.
If you get the Docker Error: "docker: permission denied while trying to connect to the docker daemon socket".... on Ubuntu (based) systems, you need to add your user to the docker group. Follow the instructions here: (https://www.digitalocean.com/community/questions/how-to-fix-docker-got-permission-denied-while-trying-to-connect-to-the-docker-daemon-socket)
This workflow uses many third-party tools to function and relies on the hard work and expertise of their respective authors. This list includes (but may not be limited to...):
SCARLET is distributed under a CC BY-NC 4.0 license. See LICENSE for more information. This license does not override any licenses that may be present in the third party tools used by SCARLET.
