A pipeline for running a complete phylogenomic analysis from a set of FASTA
file, one for each shared ortholog group. It is based on container and Snakemake.
In short, what it does is:
- Multiple sequence alignment with
MAFFT
- Extraction of conserved blocks with
GBlocks
- Generation of a phylogenetic tree for each ortholog group with
RAxML
- Generation of a species tree with
ASTRAL
- Snakemake
- Singularity (or docker)
- Clone this repository
git clone https://github.com/fgajardoe/PhylogenomicPipeline.git
- Get the container
cd PhylogenomicPipeline
singularity pull docker://fgajardoe/phylogenomic-analysis-container:latest
Note: Although the image is hosted on DockerHub, all the pipeline uses Singularity. Adapting it shouldn't be a big deal. It'd be just needed to modify the Snakefile for running Docker commands instead of Singularity.
- Build a
configfile
It's a Snakemake configfile, which is a yaml
formatted text file. Keep in mind the example provided here.
Remember updating the begining of the Snakefile to match your configfile.
The configfile
must associate wildcards to their corresponding FASTA
files, each one containing ortholog sequences for each specie considered in your analysis. In other words, you need one FASTA
per ortholog group, and that file must contain the sequence of that ortholog in each specie. You can use BUSCO and extract all orthologs shared by your panel of species from its results.
- Run the pipeline
conda activate snakemake
snakemake -p -j1
Good look!