-
Notifications
You must be signed in to change notification settings - Fork 4
0. overview of major result files
cgcg converts zol results into an interactive pan-gene-cluster plot. Similar to cgc, it allows for scalable visualization of information across 100s to 1000s of gene clusters. The result is a simple HTML file that is easy to port and view on various devices. Each node corresponds to an ortholog group with edges showing gene order information determined in zol's algorithm for inferring consensus order. Coloring of nodes can be performed based on various quantitative evolutionary statistics reported in zol. Final tailoring touches and exporting to SVG/PNG can be done in the HTML itself!
cgc allows you to create custom visualizations of 100s to 1000s of gene clusters which are easy to interpret by collapsing down information on conservation and evolutionary stats into barplots across a gene schematic of the focal gene cluster in consensus order/direction. It takes as input simply a directory of zol results.
The two major results of fai are:
- A directory of homologous or orthologous gene clusters (in GenBank format) from target genomes to the query gene cluster provided as input to the program. These can be found in the directory
fai_Results/Final_Results/Homologous_Gene_Cluster_GenBanks/
- An XLSX spreadsheet which allows users to assess homologous hits to the query gene cluster at scale. Most columns feature automatic conditional formatting to ease user assessment of quantitative fields.
In addition, certain visuals are generated to help with visual validation of detected gene clusters in target genomes as truly being homologous or orthologous to the query gene cluster. Check out the page more info on fai for details on plots that fai can generate.
zol takes as input a set of related gene clusters in GenBank format and produces one primary result file, an XLSX spreadsheet which shows ortholog groups as rows and details on conservation, various evolutionary stats, and annotation info from multiple databases as the columns. Quantitative columns are automatically color-formatted. It is sorted by default in the consensus order genes occur in within gene clusters.
It can also be used to "dereplicate" gene clusters which can then be used for clinker analysis, for details on how to do that, check out the tutorial wiki page.
salt also creates an XLSX spreadsheet as its primary output (in addition to a couple plots), and provides information for individual gene cluster instances identified by fai as homologous to the query gene cluster of instance. Specifically, salt gives information that might suggest a gene cluster instance has experienced lateral/horizontal transfer.
The highlighted row corresponds to a gene cluster instance of the crt operon detected in a Staphylococcus pasteuri genome. Why is this interesting:
- Low codoff empirical P-value (Column H): This tells us that the codon usage for the crt operon is different from the codon-usage of the rest of the genome.
- Short distance from a transposon (Column L): This tells us that the crt operon is not far away from suspected IS element or transposon.
- Homologs of plasmid-associated proteins found on the same scaffold (Column O): This suggests the crt operon instance might be on a plasmid. We can look up the scaffold ID on NCBI and, indeed, it is marked as a plasmid in the name.
Similar to fai and zol's major results, the results from abon, atpoc, and apos also primarily produce XLSX spreadsheets. On the first tab of their resulting XLSX spreadsheet, is an overview of the focal sample's BGC, phage, or plasmid predictions:
Then on the second tab, the coverage of the focal sample's BGC-ome, phage-ome, or plasmid-ome across the genomes in the target genomes database is shown: