bugseq-pipeline

BugSeq automatically analyzes clinical microbiology nanopore sequencing data from start to finish. This includes taxonomic classification of reads, antimicrobial resistance prediction and detailed subtyping for public health purposes. It was created during hackseq 2019.

Rationale

Modern clinical microbiology techniques take a day to grow and identify an organism, and another day to determine antimicrobial susceptibilities. Yet, clinical trials show that patients with septic shock have a ~7% increase in mortality for every hour delay in appropriate antimicrobial therapy. Furthermore, patients with rare or novel infections may never have the etiology of their illness diagnosed as traditional techniques can not pick up their identify their infection. Metagenomic nanopore sequencing has the potential to drastically speed up the diagnosis and characterization of infections, potentially including novel pathogens, enabling better patient outcomes. Recovering pathogen genomes provides a vast amount of clinically useful information, such as whether a patient's E. coli is susceptible to ceftriaxone, whether a patient's V. cholerae is toxigenic, or whether the M. tuberculosis between two patients are likely to be epidemiologically linked.

Quick start

git clone https://github.com/schorlton/bugseq-pipeline.git
cd bugseq-pipeline
nextflow main.nf --fastq in.fq --outdir output_dir

Requirements

Nextflow
Conda
A powerful computer (eg. 128GB RAM or more)

Input

A nanopore basecalled fastq. Can be any library version (R7-10). Can be barcoded or not. Can be amplicon data (eg. 16S/ITS), isolate data (eg. a colony of Staphylococcus aureus), or clinical metagenomic data (eg. the sputum of a patient).

Output

An interactive html file and a static pdf summary file. These files will show a taxonomic classification of the percentage makeup of organisms in each patient sample. This could include viral, bacterial, fungal and protozoal organisms. For each of these organisms, the presence of antimicrobial resistance genes will be visualized, along with the predicted phenotype for the antimicrobial drug associated with these genes.

Example here

Options

usage: nextflow main.nf --fastq 'in.fq' --outdir 'out_dir' [options...]
options:
  # Input options
  --fastq
  --control_fastq            Control sample fastqs for complex subtraction from cases. Can use regex patterns or specify multiple file separated by commas.
  
  # Output options
  --outdir
  
  # Pipeline options
  --isolate                 Input file(s) are from isolate sequencing [default: automatic detection]
  --metagenome              Input file(s) are metagenomic samples [default: automatic detection]
  --16S                     Input file(s) are 16S amplicon sequencing data [default: automatic detection]
  --ITS                     Input file(s) are ITS amplicon sequencing data [default: automatic detection]
  --skipQC                  Skip quality assessment and read trimming
  --skipTyping              Skip public health typing analysis
  --skipAMR                 Skip AMR prediction step
  --meanQ [7]               Reads with mean quality below this value will be filtered from analysis
  --minLength [250]         Reads with length below this threshold will be filtered from analysis

Pipeline overview

User inputs basecalled nanopore fastq reads
BugSeq validates the fastq file (fqtools) and determines if it's truly nanopore data
Next, the fastq undergoes quality assessment with FastQC and results combined with multiqc
Reads are adapter trimmed and demultiplexed (qcat)
Reads are quality and length filtered
Trimmed and demultiplexed reads undergo experiment type detection to determine if this is amplicon data (eg. 16S/ITS), cultured isolate data or metagenomic data (magic..., including sourmash)

Isolate data

Genome assembly (Flye)
Taxonomic classification of assembly (minimap2 + Pathoscope ID)

Metagenomic data

Read-level taxonomic classification to species level
Correction for control samples to identify significant pathogens in the cases only
Metagenome assembly (metaFlye)
Taxonomic binning of species within metagenome

Pathogen specific analyses

Phenotypic antimicrobial resistance prediction
MLST (when public scheme avaiable)
cgMLST/wgMLST (when public scheme available)
Serotyping (when applicable)
Other old-school typing (eg. Spoligotyping for M. tuberculosis)
Toxin detection (when clinically relevant)
Phylogenetic tree building (when multiple isolates inputted)

Changelog

0.0.1

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
aux_scripts		aux_scripts
conda_cofigs		conda_cofigs
refs		refs
README.md		README.md
main.nf		main.nf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bugseq-pipeline

Rationale

Quick start

Outline

Requirements

Input

Output

Options

Pipeline overview

Isolate data

Metagenomic data

Pathogen specific analyses

Changelog

About

Releases

Packages

Languages

awilson0/bugseq-pipeline

Folders and files

Latest commit

History

Repository files navigation

bugseq-pipeline

Rationale

Quick start

Outline

Requirements

Input

Output

Options

Pipeline overview

Isolate data

Metagenomic data

Pathogen specific analyses

Changelog

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages