-
Notifications
You must be signed in to change notification settings - Fork 4
5.1 tutorial for using zol with output from fast.genomics and CAGECAT
While prepTG and fai offer convenient options for finding homologous/orthologous instances of a gene cluster in a target database of genomes, to search large databases of genomes will require the use of a server or access to a considerable amount of disk space.
CAGECAT and fast.genomics are great server-based alternatives for determining sets of homologous gene clusters/neighborhoods which can then be investigated in more depth using zol.
fast.genomics (https://fast.genomics.lbl.gov/cgi/search.cgi) is a great web application by Price and Arkin for finding high divergence homologs and gaining a phylogenetic perspective of conservation.
Go to the fast.genomics web application and click the "Gene neighborhoods" link to go to an example neighborhood:
Click on the "table_of_genes" link to download information for the genomes/proteins in the gene neighborhood.
You could also use the gene neighborhood GenBank files as input to create manual visualizations via clinker of pyGenomeViz.
# with zol conda environment activated
# browse_ING2E5A_RS06865.tsv is the file downloaded in Step 2.
fastgenomicsNeighborhoodToGenBanks.py -i browse_ING2E5A_RS06865.tsv -o fast.genomics_neighborhood_output/
# run zol with 4 threads
zol -i fast.genomics_neighborhood_output/Gene_Cluster_GenBank_Files/ -o zol_Results/ -c 4
Will expand in the near future! But users can just download clusters identified in GenBank format and either:
- run zol directly on them by providing the uncompressed directory as input with
-r
option to rename locus-tags (because CAGECAT gene cluster GenBank files feature protein_id identifiers instead of locus_tag identifiers)
or
- use the
cagecatProcess.py
script to automatically recreate the GenBank files with the value of protein_id qualifiers copied over and assigned as values to locus_tag qualifiers
Note, because zol uses codon alignments for some statistics and CDS features in exported gene cluster GenBank files from CAGECAT do not feature specifics on exon coordinates, CAGECAT to zol is not currently viable for fungal/eukaryotic investigations. The zol suite does support eukaryotic investigations and if you are interested in this, please look at prepTG and fai pages for further information.