Skip to content

9.3 more info on apos

Rauf Salamzade edited this page Mar 24, 2024 · 6 revisions

apos - Assess Temperate Plasmid-Ome Conservation

apos takes as input MOB-suite and/or geNomad results directories (with plasmid predictions) for a single sample together with a prepTG (target genomes) database to determine how conserved the sample's plasmid-ome is across the genomes in the database. This could give insight into the conservation of specific plasmids in the focal sample's genome across its species or genus.

The specific cutoffs used in fai for gene cluster detection in target genomes can be adapted as needed. Alternatively, a simple BLASTp search can be performed instead to determine all homologs of proteins for each BGC from the focal sample in target genomes regardless of whether they are similarly co-located or not. Default parameters for fai-based detection of plasmids are: 50% of plasmid genes need to be identified in whole or fragmented along scaffold edges via DIAMOND BLASTp at an E-value threshold of 1e-10. Because plasmids are highly dynamic, the syntenic similarity requirement is turned off.

Because plasmids are highly dynamic - we recommend using the simple BLASTp search mode instead of the default of fai. This is because fai will require genes to be co-located and plasmid parts can be exchanged with other plasmids and the chromosome. Simple BLASTp searching can be requested with the -s argument.

If fai is used for searching (the default), check out the individual fai results (in the subdirectory fai_or_blast_Results/) for each plasmid to see details on the conservation of individual genes. Further, follow up analysis can be performed using zol per plasmid to summarize the conservation of distinct ortholog groups, evolutionary stats, and functional info.

Conservation of Enterococcus faecalis V583 plasmids across the Enterococcus genus

The following is a mini-tutorial on using apos to investigate the novelty of the plasmid-ome of Enterococcus faecalis st. V583 to representative Enterococcus genomes we made available in a precompiled prepTG database.

First, lets download the query genome of interest:

# Download genome from NCBI
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/007/785/GCF_000007785.1_ASM778v1/GCF_000007785.1_ASM778v1_genomic.fna.gz

# Uncompress it & rename it
gunzip GCF_000007785.1_ASM778v1_genomic.fna.gz
mv GCF_000007785.1_ASM778v1_genomic.fna Enterococcus_faecalis_V583.fna

Next, we can run MOB-suite and geNomade to identify plasmids in the focal genome:

# in some conda environment or setting with MOB-suite available 
mob_recon  --infile Enterococcus_faecalis_V583.fna --outdir MOBsuite_Results/

# in some conda environment or setting with geNomad available
genomad end-to-end Enterococcus_faecalis_V583.fna geNomad_Results/ /path/to/genomad_dbs/

Next, we can setup the precompiled database of Enterococcus representative genome using prepTG:

# in zol's conda environment or via the Docker wrapper:
prepTG -d Enterococcus -o Enterococcus_Reps_prepTG_Database/

Now we are ready to run apos!

# Note, as per our recommendation above, we run apos with the simple blast search method via the -s argument.
apos -i Enterococcus_faecalis_V583.fna -tg Enterococcus_Reps_prepTG_Database/ -ns MOBsuite_Results/ -gn geNomad_Results/ -o apos_Results/ -c 20 -s

Note, this can take a while as it will involve running fai X times (where X is the number of plasmid predictions across all methods in the focal sample of interest).

The result!

Similar to fai and zol's major results, apos also primarily produces an XLSX spreadsheet. On the first tab of apos's resulting XLSX spreadsheet, is an overview of the focal sample's plasmid predictions from the different software:

image

Then on the second tab, the coverage of the focal sample's plasmid-ome across the genomes in the target genomes database is shown:

image

Usage

usage: apos [-h] -i SAMPLE_GENOME [-ms MOBSUITE_RESULTS] [-gn GENOMAD_RESULTS] -tg TARGET_GENOMES_DB [-up] [-fo FAI_OPTIONS] [-s]
            [-si SIMPLE_BLASTP_IDENTITY_CUTOFF] [-sc SIMPLE_BLASTP_COVERAGE_CUTOFF] [-se SIMPLE_BLASTP_EVALUE_CUTOFF]
            [-sm SIMPLE_BLASTP_SENSITIVITY_MODE] -o OUTDIR [-c CPUS]

        Program: apos
        Author: Rauf Salamzade
        Affiliation: Kalan Lab, UW Madison, Department of Medical Microbiology and Immunology

        apos - Assess Plasmid-Ome Similarity

        apos wraps fai to assess the conservation of a sample's plasmid-ome
        relative to a set of target genomes (e.g. genomes belonging to the same genus). Alternatively,
        it can run a simple DIAMOND BLASTp analysis to just assess the presence of plasmid genes
        individually - without the requirement they are co-located in one scaffold like in the focal sample.


options:
  -h, --help            show this help message and exit
  -i SAMPLE_GENOME, --sample_genome SAMPLE_GENOME
                        Path to sample genome in GenBank or FASTA format.
  -ms MOBSUITE_RESULTS, --mobsuite_results MOBSUITE_RESULTS
                        Path to MOB-suite (mob_recon) results directory for a single sample/genome.
  -gn GENOMAD_RESULTS, --genomad_results GENOMAD_RESULTS
                        Path to GeNomad results directory for a single sample/genome.
  -tg TARGET_GENOMES_DB, --target_genomes_db TARGET_GENOMES_DB
                        prepTG database directory for target genomes of interest.
  -fo FAI_OPTIONS, --fai_options FAI_OPTIONS
                        Provide fai options to run. Should be surrounded by quotes. [Default is "-e 1e-10 -m 0.5 -dm -sct 0.0"]
  -s, --use_simple_blastp
                        Use a simple DIAMOND BLASTp search with no requirement for co-localization of hits.
  -si SIMPLE_BLASTP_IDENTITY_CUTOFF, --simple_blastp_identity_cutoff SIMPLE_BLASTP_IDENTITY_CUTOFF
                        If simple BLASTp mode requested : cutoff for identity between query proteins and matches in target genomes [Default is 40.0].
  -sc SIMPLE_BLASTP_COVERAGE_CUTOFF, --simple_blastp_coverage_cutoff SIMPLE_BLASTP_COVERAGE_CUTOFF
                        If simple BLASTp mode requested : cutoff for coverage between query proteins and matches in target genomes [Default is 70.0].
  -se SIMPLE_BLASTP_EVALUE_CUTOFF, --simple_blastp_evalue_cutoff SIMPLE_BLASTP_EVALUE_CUTOFF
                        If simple BLASTp mode requested : cutoff for E-value between query proteins and matches in target genomes [Default is 1e-10].
  -sm SIMPLE_BLASTP_SENSITIVITY_MODE, --simple_blastp_sensitivity_mode SIMPLE_BLASTP_SENSITIVITY_MODE
                        Sensitivity mode for DIAMOND BLASTp. [Default is "very-sensititve"].
  -o OUTDIR, --outdir OUTDIR
                        Output directory.
  -c CPUS, --cpus CPUS  The number of CPUs to use.