-
Notifications
You must be signed in to change notification settings - Fork 16
Tutorial
Here we provide an example run using data from SRA and OICR.
FASTQ files are provided by various international institutes and are publicly available in the Sequence Read Archive (SRA). 3 random samples sequenced on the Illumina platform were downloaded from SRA to demonstrate the output generated from ncov-tools
.
The following samples were download from SRA:
VA-DCLS-1905
SRX9446939: Amplicon-based sequencing of SARS-CoV-2: VA-DCLS-1905
1 ILLUMINA (Illumina MiSeq) run: 190,731 spots, 18.6M bases, 7.8Mb downloads
ARTIC PCR-tiling of viral cDNA (V3), sequenced by Illumina MiSeq with DNA Flex library prep-kit. Only reads aligned to SARS-CoV-2 reference (NC_045512.2) retained.
VA-DCLS-1863
SRX9446956: Amplicon-based sequencing of SARS-CoV-2: VA-DCLS-1863
1 ILLUMINA (Illumina MiSeq) run: 244,177 spots, 23.9M bases, 10.1Mb downloads
ARTIC PCR-tiling of viral cDNA (V3), sequenced by Illumina MiSeq with DNA Flex library prep-kit. Only reads aligned to SARS-CoV-2 reference (NC_045512.2) retained
VA-DCLS-1856
SRX9446952: Amplicon-based sequencing of SARS-CoV-2: VA-DCLS-1856
1 ILLUMINA (Illumina MiSeq) run: 229,677 spots, 22.4M bases, 9.5Mb downloads
ARTIC PCR-tiling of viral cDNA (V3), sequenced by Illumina MiSeq with DNA Flex library prep-kit. Only reads aligned to SARS-CoV-2 reference (NC_045512.2) retained.
Note that OICR has provided negative control FASTQ files.
The Connor Lab has built a Nextflow pipeline, with focus on COVID-19, to run alignment and variant tools and generate output for use with downstream analysis. Review the documentation for Nextflow and the ncov2019-artic-nv
pipeline for instructions on installing and running the pipeline.
Nextflow v20.10.0 build 5430
https://github.com/connor-lab/ncov2019-artic-nf
- Create the following directory structure:
run_name
├── data
└── qc
└── data
- Transfer the FASTQ files from the SRA samples into the
run_name/data
directory. - Clone the negative controls repository and copy the FASTQ files into the
run_name/data
directory. - Clone the Connor Lab Nextflow pipeline repository and the ncov primer schemes into run_name
cd run_name
git clone [email protected]:connor-lab/ncov2019-artic-nf.git
git clone [email protected]:artic-network/artic-ncov2019.git
- Run the Nextflow pipeline inside the
run_name
directory:
nextflow run ncov2019-artic-nf/main.nf --schemeVersion V3 --directory data --illumina --prefix run_name
- Link all
.bam
,.consensus.fa
, and.variants.tsv
files into theqc/data
directory
cd qc/data
ln -s ../../results/ncovIllumina_sequenceAnalysis_trimPrimerSequences/<sample>.mapped.bam
ln -s ../../results/ncovIllumina_sequenceAnalysis_trimPrimerSequences/<sample>.mapped.primertrimmed.sorted.bam
ln -s ../../results/ncovIllumina_sequenceAnalysis_readMapping/<sample>.sorted.bam
ln -s ../../results/ncovIllumina_sequenceAnalysis_makeConsensus/<sample>.primertrimmed.consensus.fa
ln -s ../../results/ncovIllumina_sequenceAnalysis_callVariants/<sample>.variants.tsv
- If you haven’t installed the
ncov-tool
package, follow the installation documentation. - Run the
ncov-tools
pipeline:
snakemake -s /path/to/Snakefile —cores <number of cores> all_qc_sequencing
snakemake -s /path/to/Snakefile —cores <number of cores> all_qc_analysis
snakemake -s /path/to/Snakefile —cores <number of cores> all_qc_reports
- Review the plots (in plots/) and the generated reports (in qc_reports/).
The _summary_qc.tsv
table shows metadata summarizing each sample and a final classification in the qc_pass
column which can be used to determine whether the sample passes or fails. In this instance, the negative control fails due to a lack of viral template (classified as INCOMPLETE_GENOME
) while all other samples pass all criteria.
The _negative_control_report.tsv
shows the Neg1
sample has passed. There were 0 amplicons detected in the control and only 6 bases covered from the alignment.
The _mixture_report.tsv
shows no samples having contamination between other samples.
The _ambiguous_position_report.tsv
does not identify any common ambiguous bases between 2 or more samples.
The _amplicon_coverage_heatmap.pdf
plot shows Neg1
having 0 coverage across amplicons while the remaining 3 samples significant coverage across all amplicons. Note that amplicon 64 is a commonly identified low coverage amplicon.
The _amplicon_covered_fraction.pdf
plot shows sample Neg
having low fraction of the amplicon covered across all amplicons. All other samples show 100% coverage across all amplicons.
The _depth_by_position.pdf
plot shows sample Neg1
having random position coverage across the genome with the majority having 0 coverage. All other samples show consistent coverage across all genomic positions.
The _tree_snps.pdf
plot includes all 3 positive samples and the reference genome labelled MN908947.3
. Note the absence of Neg1
in the plot which only includes samples at 75% genome completeness. From here we can identify common mutations between samples and their phylogenetic profile.