Skip to content

Latest commit

 

History

History
56 lines (38 loc) · 3.33 KB

README.md

File metadata and controls

56 lines (38 loc) · 3.33 KB

EvolDeeds

This repository contains Python and JavaScript implementations of various algorithms for computing the likelihoods of phylogenetic alignments of proteins, as well as the beginnings of a gamification framework for crowdsourced phylogenetics.

The underlying probabilistic models for sequence evolution are continuous-time Markov chains for substitutions, hidden Markov models for indels, Potts models for interactions between amino acids, and continuous-time Bayes networks (CTBNs) for covariant substitution processes.

JavaScript

The JavaScript code is in the js/ subdirectory. It includes implementations of

  • Felsenstein's algorithm for computing the likelihood of the substitutions in the alignment
  • HMM-based algorithms for computing the likelihood of the indels in the alignment

Two different models (H20 and KM03) are implemented for calculating the HMM transition probabilities in terms of the parameters of the underlying indel model.

The JavaScript code also includes a JSON data structure (Cigar Tree) that compactly represents a phylogenetic tree, multiple sequence alignment (MSA), and ancestral sequence reconstruction, using a CIGAR-like format.

Python

The python/ subdirectory of the repo contains considerably more in the way of algorithms, though the basics should be compatible with the JavaScript code described above.

The Python code is implemented using Jax, making it suitable for model-fitting (which should be accelerated if using GPUs).

In addition to the models and data structures described above in the JavaScript section, the Python codebase includes

  • an implementation of the CherryML approach to fitting substitution rate matrices
  • several variations of CherryML and combinations with EM-like algorithms for fitting mixtures of substitution models
  • an implementation of a variational algorithm for CTBN Potts models, for computing alignment substitution likelihoods where there are interactions between amino acids (i.e. because they are in physical contact in the folded 3D structure)

Data

The data/ subdirectory contains a few test alignments and parameters.

AWS Lambda code

The aws/ subdirectory contains code implementing a REST API (using serverless Amazon Web Services) whereby an admin can set up a sequence dataset, and users can then post their solutions to the problem of reconstructing the most likely evolutionary history explaining that dataset, using the above probabilistic models as a scoring scheme.

Front-end client

The frontend-client/ subdirectory contains a stub for a React/Vite application that will eventually allow users to submit their own evolutionary histories using the REST API defined in aws/.