Skip to content

Tutorial

Richard de Borja edited this page Nov 10, 2020 · 3 revisions

Here we provide an example run using data from SRA and OICR.

Datasets

Example dataset from SRA

FASTQ files are provided by various international institutes and are publicly available in the Sequence Read Archive (SRA). 3 random samples sequenced on the Illumina platform were downloaded from SRA to demonstrate the output generated from ncov-tools.

The following samples were download from SRA:

VA-DCLS-1905
SRX9446939: Amplicon-based sequencing of SARS-CoV-2: VA-DCLS-1905
1 ILLUMINA (Illumina MiSeq) run: 190,731 spots, 18.6M bases, 7.8Mb downloads

ARTIC PCR-tiling of viral cDNA (V3), sequenced by Illumina MiSeq with DNA Flex library prep-kit. Only reads aligned to SARS-CoV-2 reference (NC_045512.2) retained.



VA-DCLS-1863
SRX9446956: Amplicon-based sequencing of SARS-CoV-2: VA-DCLS-1863
1 ILLUMINA (Illumina MiSeq) run: 244,177 spots, 23.9M bases, 10.1Mb downloads

ARTIC PCR-tiling of viral cDNA (V3), sequenced by Illumina MiSeq with DNA Flex library prep-kit. Only reads aligned to SARS-CoV-2 reference (NC_045512.2) retained


VA-DCLS-1856
SRX9446952: Amplicon-based sequencing of SARS-CoV-2: VA-DCLS-1856
1 ILLUMINA (Illumina MiSeq) run: 229,677 spots, 22.4M bases, 9.5Mb downloads

ARTIC PCR-tiling of viral cDNA (V3), sequenced by Illumina MiSeq with DNA Flex library prep-kit. Only reads aligned to SARS-CoV-2 reference (NC_045512.2) retained.

Note that OICR has provided negative control FASTQ files.

nCoV Nextflow Pipeline

The Connor Lab has built a Nextflow pipeline, with focus on COVID-19, to run alignment and variant tools and generate output for use with downstream analysis. Review the documentation for Nextflow and the ncov2019-artic-nv pipeline for instructions on installing and running the pipeline.

Nextflow v20.10.0 build 5430

https://github.com/connor-lab/ncov2019-artic-nf

  1. Create the following directory structure:
run_name
├── data
└── qc
    └── data
  1. Transfer the FASTQ files from the SRA samples into the run_name/data directory.
  2. Clone the negative controls repository and copy the FASTQ files into the run_name/data directory.
  3. Clone the Connor Lab Nextflow pipeline repository and the ncov primer schemes into run_name
cd run_name
git clone [email protected]:connor-lab/ncov2019-artic-nf.git
git clone [email protected]:artic-network/artic-ncov2019.git
  1. Run the Nextflow pipeline inside the run_name directory:
nextflow run ncov2019-artic-nf/main.nf --schemeVersion V3 --directory data --illumina --prefix run_name
  1. Link all .bam, .consensus.fa, and .variants.tsv files into the qc/data directory
cd qc/data
ln -s ../../results/ncovIllumina_sequenceAnalysis_trimPrimerSequences/<sample>.mapped.bam
ln -s ../../results/ncovIllumina_sequenceAnalysis_trimPrimerSequences/<sample>.mapped.primertrimmed.sorted.bam
ln -s ../../results/ncovIllumina_sequenceAnalysis_readMapping/<sample>.sorted.bam
ln -s ../../results/ncovIllumina_sequenceAnalysis_makeConsensus/<sample>.primertrimmed.consensus.fa
ln -s ../../results/ncovIllumina_sequenceAnalysis_callVariants/<sample>.variants.tsv
  1. If you haven’t installed the ncov-tool package, follow the installation documentation.
  2. Run the ncov-tools pipeline:
snakemake -s /path/to/Snakefile —cores <number of cores> all_qc_sequencing
snakemake -s /path/to/Snakefile —cores <number of cores> all_qc_analysis
snakemake -s /path/to/Snakefile —cores <number of cores> all_qc_reports
  1. Review the plots (in plots/) and the generated reports (in qc_reports/).

Interpretation of results

_summary_qc.tsv

The _summary_qc.tsv table shows metadata summarizing each sample and a final classification in the qc_pass column which can be used to determine whether the sample passes or fails. In this instance, the negative control fails due to a lack of viral template (classified as INCOMPLETE_GENOME) while all other samples pass all criteria.

_negative_control_report.tsv

The _negative_control_report.tsv shows the Neg1 sample has passed. There were 0 amplicons detected in the control and only 6 bases covered from the alignment.

_mixture_report.tsv

The _mixture_report.tsv shows no samples having contamination between other samples.

_ambiguous_position_report.tsv

The _ambiguous_position_report.tsv does not identify any common ambiguous bases between 2 or more samples.

_run_name_amplicon_coverage_heatmap.pdf

The _amplicon_coverage_heatmap.pdf plot shows Neg1 having 0 coverage across amplicons while the remaining 3 samples significant coverage across all amplicons. Note that amplicon 64 is a commonly identified low coverage amplicon.

_amplicon_covered_fraction.pdf

The _amplicon_covered_fraction.pdf plot shows sample Neg having low fraction of the amplicon covered across all amplicons. All other samples show 100% coverage across all amplicons.

_depth_by_position.pdf

The _depth_by_position.pdf plot shows sample Neg1 having random position coverage across the genome with the majority having 0 coverage. All other samples show consistent coverage across all genomic positions.

_tree_snps.pdf

The _tree_snps.pdf plot includes all 3 positive samples and the reference genome labelled MN908947.3. Note the absence of Neg1 in the plot which only includes samples at 75% genome completeness. From here we can identify common mutations between samples and their phylogenetic profile.