-
Notifications
You must be signed in to change notification settings - Fork 4
9.3 more info on apos
apos takes as input MOB-suite and/or geNomad results directories (with plasmid predictions) for a single sample together with a prepTG (target genomes) database to determine how conserved the sample's plasmid-ome is across the genomes in the database. This could give insight into the conservation of specific plasmids in the focal sample's genome across its species or genus.
The specific cutoffs used in fai for gene cluster detection in target genomes can be adapted as needed. Alternatively, a simple BLASTp search can be performed instead to determine all homologs of proteins for each BGC from the focal sample in target genomes regardless of whether they are similarly co-located or not. Default parameters for fai-based detection of plasmids are: 50% of plasmid genes need to be identified in whole or fragmented along scaffold edges via DIAMOND BLASTp at an E-value threshold of 1e-10. Because plasmids are highly dynamic, the syntenic similarity requirement is turned off.
Because plasmids are highly dynamic - we recommend using the simple BLASTp search mode instead of the default of fai. This is because fai will require genes to be co-located and plasmid parts can be exchanged with other plasmids and the chromosome. Simple BLASTp searching can be requested with the -s
argument.
If fai is used for searching (the default), check out the individual fai results (in the subdirectory
fai_or_blast_Results/
) for each plasmid to see details on the conservation of individual genes. Further, follow up analysis can be performed using zol per plasmid to summarize the conservation of distinct ortholog groups, evolutionary stats, and functional info.
The following is a mini-tutorial on using apos to investigate the novelty of the plasmid-ome of Enterococcus faecalis st. V583 to representative Enterococcus genomes we made available in a precompiled prepTG database.
First, lets download the query genome of interest:
# Download genome from NCBI
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/007/785/GCF_000007785.1_ASM778v1/GCF_000007785.1_ASM778v1_genomic.fna.gz
# Uncompress it & rename it
gunzip GCF_000007785.1_ASM778v1_genomic.fna.gz
mv GCF_000007785.1_ASM778v1_genomic.fna Enterococcus_faecalis_V583.fna
Next, we can run MOB-suite and geNomade to identify plasmids in the focal genome:
# in some conda environment or setting with MOB-suite available
mob_recon --infile Enterococcus_faecalis_V583.fna --outdir MOBsuite_Results/
# in some conda environment or setting with geNomad available
genomad end-to-end Enterococcus_faecalis_V583.fna geNomad_Results/ /path/to/genomad_dbs/
Next, we can setup the precompiled database of Enterococcus representative genome using prepTG:
# in zol's conda environment or via the Docker wrapper:
prepTG -d Enterococcus -o Enterococcus_Reps_prepTG_Database/
Now we are ready to run apos!
# Note, as per our recommendation above, we run apos with the simple blast search method via the -s argument.
apos -i Enterococcus_faecalis_V583.fna -tg Enterococcus_Reps_prepTG_Database/ -ns MOBsuite_Results/ -gn geNomad_Results/ -o apos_Results/ -c 20 -s
Note, this can take a while as it will involve running fai X times (where X is the number of plasmid predictions across all methods in the focal sample of interest).
Similar to fai and zol's major results, apos also primarily produces an XLSX spreadsheet. On the first tab of apos's resulting XLSX spreadsheet, is an overview of the focal sample's plasmid predictions from the different software:
Then on the second tab, the coverage of the focal sample's plasmid-ome across the genomes in the target genomes database is shown:
usage: apos [-h] -i SAMPLE_GENOME [-ms MOBSUITE_RESULTS] [-gn GENOMAD_RESULTS] -tg TARGET_GENOMES_DB [-up] [-fo FAI_OPTIONS] [-s]
[-si SIMPLE_BLASTP_IDENTITY_CUTOFF] [-sc SIMPLE_BLASTP_COVERAGE_CUTOFF] [-se SIMPLE_BLASTP_EVALUE_CUTOFF]
[-sm SIMPLE_BLASTP_SENSITIVITY_MODE] -o OUTDIR [-c CPUS]
Program: apos
Author: Rauf Salamzade
Affiliation: Kalan Lab, UW Madison, Department of Medical Microbiology and Immunology
apos - Assess Plasmid-Ome Similarity
apos wraps fai to assess the conservation of a sample's plasmid-ome
relative to a set of target genomes (e.g. genomes belonging to the same genus). Alternatively,
it can run a simple DIAMOND BLASTp analysis to just assess the presence of plasmid genes
individually - without the requirement they are co-located in one scaffold like in the focal sample.
options:
-h, --help show this help message and exit
-i SAMPLE_GENOME, --sample_genome SAMPLE_GENOME
Path to sample genome in GenBank or FASTA format.
-ms MOBSUITE_RESULTS, --mobsuite_results MOBSUITE_RESULTS
Path to MOB-suite (mob_recon) results directory for a single sample/genome.
-gn GENOMAD_RESULTS, --genomad_results GENOMAD_RESULTS
Path to GeNomad results directory for a single sample/genome.
-tg TARGET_GENOMES_DB, --target_genomes_db TARGET_GENOMES_DB
prepTG database directory for target genomes of interest.
-fo FAI_OPTIONS, --fai_options FAI_OPTIONS
Provide fai options to run. Should be surrounded by quotes. [Default is "-e 1e-10 -m 0.5 -dm -sct 0.0"]
-s, --use_simple_blastp
Use a simple DIAMOND BLASTp search with no requirement for co-localization of hits.
-si SIMPLE_BLASTP_IDENTITY_CUTOFF, --simple_blastp_identity_cutoff SIMPLE_BLASTP_IDENTITY_CUTOFF
If simple BLASTp mode requested : cutoff for identity between query proteins and matches in target genomes [Default is 40.0].
-sc SIMPLE_BLASTP_COVERAGE_CUTOFF, --simple_blastp_coverage_cutoff SIMPLE_BLASTP_COVERAGE_CUTOFF
If simple BLASTp mode requested : cutoff for coverage between query proteins and matches in target genomes [Default is 70.0].
-se SIMPLE_BLASTP_EVALUE_CUTOFF, --simple_blastp_evalue_cutoff SIMPLE_BLASTP_EVALUE_CUTOFF
If simple BLASTp mode requested : cutoff for E-value between query proteins and matches in target genomes [Default is 1e-10].
-sm SIMPLE_BLASTP_SENSITIVITY_MODE, --simple_blastp_sensitivity_mode SIMPLE_BLASTP_SENSITIVITY_MODE
Sensitivity mode for DIAMOND BLASTp. [Default is "very-sensititve"].
-o OUTDIR, --outdir OUTDIR
Output directory.
-c CPUS, --cpus CPUS The number of CPUs to use.