IgFold

Official repository for IgFold: Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies.

The code and pre-trained models from this work are made available for non-commercial use (including at commercial entities) under the terms of the JHU Academic Software License Agreement. For commercial inquiries, please contact jruffolo[at]jhu.edu.

Install

For easiest use, install IgFold via PyPI:

$ pip install igfold

To access the latest version of the code, clone and install the repository:

$ git clone [email protected]:Graylab/IgFold.git 
$ pip install IgFold

Two refinement methods are supported for IgFold predictions. To follow the manuscript, PyRosetta should be installed following the instructions here. If PyRosetta is not installed, refinement with OpenMM will be attempted. For this option, OpenMM must be installed and configured before running IgFold as follows:

$ conda install -c conda-forge openmm pdbfixer

Usage

Note: The first time IgFoldRunner is initialized, it will download the pre-trained weights. This may take a few minutes and will require a network connection.

Antibody structure prediction from sequence

Paired antibody sequences can be provided as a dictionary of sequences, where the keys are chain names and the values are the sequences.

from igfold import IgFoldRunner, init_pyrosetta

init_pyrosetta()

sequences = {
    "H": "EVQLVQSGPEVKKPGTSVKVSCKASGFTFMSSAVQWVRQARGQRLEWIGWIVIGSGNTNYAQKFQERVTITRDMSTSTAYMELSSLRSEDTAVYYCAAPYCSSISCNDGFDIWGQGTMVTVS",
    "L": "DVVMTQTPFSLPVSLGDQASISCRSSQSLVHSNGNTYLHWYLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKISRVEAEDLGVYFCSQSTHVPYTFGGGTKLEIK"
}
pred_pdb = "my_antibody.pdb"

igfold = IgFoldRunner()
igfold.fold(
    pred_pdb, # Output PDB file
    sequences=sequences, # Antibody sequences
    do_refine=True, # Refine the antibody structure with PyRosetta
    do_renum=True, # Send predicted structure to AbNum server for Chothia renumbering
)

To predict a nanobody structure (or an individual heavy or light chain), simply provide one sequence:

from igfold import IgFoldRunner, init_pyrosetta

init_pyrosetta()

sequences = {
    "H": "QVQLQESGGGLVQAGGSLTLSCAVSGLTFSNYAMGWFRQAPGKEREFVAAITWDGGNTYYTDSVKGRFTISRDNAKNTVFLQMNSLKPEDTAVYYCAAKLLGSSRYELALAGYDYWGQGTQVTVS"
}
pred_pdb = "my_nanobody.pdb"

igfold = IgFoldRunner()
igfold.fold(
    pred_pdb, # Output PDB file
    sequences=sequences, # Nanobody sequence
    do_refine=True, # Refine the antibody structure with PyRosetta
    do_renum=True, # Send predicted structure to AbNum server for Chothia renumbering
)

To predict a structure without PyRosetta refinement, set do_refine=False:

from igfold import IgFoldRunner

sequences = {
    "H": "QVQLQESGGGLVQAGGSLTLSCAVSGLTFSNYAMGWFRQAPGKEREFVAAITWDGGNTYYTDSVKGRFTISRDNAKNTVFLQMNSLKPEDTAVYYCAAKLLGSSRYELALAGYDYWGQGTQVTVS"
}
pred_pdb = "my_nanobody.pdb"

igfold = IgFoldRunner()
igfold.fold(
    pred_pdb, # Output PDB file
    sequences=sequences, # Nanobody sequence
    do_refine=False, # Refine the antibody structure with PyRosetta
    do_renum=True, # Send predicted structure to AbNum server for Chothia renumbering
)

Predicted RMSD for antibody structures

RMSD estimates are calculated per-residue and recorded in the B-factor column of the output PDB file. These values are also returned from the fold method.

from igfold import IgFoldRunner, init_pyrosetta

init_pyrosetta()

sequences = {
    "H": "EVQLVQSGPEVKKPGTSVKVSCKASGFTFMSSAVQWVRQARGQRLEWIGWIVIGSGNTNYAQKFQERVTITRDMSTSTAYMELSSLRSEDTAVYYCAAPYCSSISCNDGFDIWGQGTMVTVS",
    "L": "DVVMTQTPFSLPVSLGDQASISCRSSQSLVHSNGNTYLHWYLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKISRVEAEDLGVYFCSQSTHVPYTFGGGTKLEIK"
}
pred_pdb = "my_antibody.pdb"

igfold = IgFoldRunner()
out = igfold.fold(
    pred_pdb, # Output PDB file
    sequences=sequences, # Antibody sequences
    do_refine=True, # Refine the antibody structure with PyRosetta
    do_renum=True, # Send predicted structure to AbNum server for Chothia renumbering
)

out.prmsd # Predicted RMSD for each residue's N, CA, C, CB atoms (dim: 1, L, 4)

Antibody sequence embedding

Representations from IgFold may be useful as features for machine learning models. The embed method can be used to surface a variety of antibody representations from the model:

from igfold import IgFoldRunner

sequences = {
    "H": "EVQLVQSGPEVKKPGTSVKVSCKASGFTFMSSAVQWVRQARGQRLEWIGWIVIGSGNTNYAQKFQERVTITRDMSTSTAYMELSSLRSEDTAVYYCAAPYCSSISCNDGFDIWGQGTMVTVS",
    "L": "DVVMTQTPFSLPVSLGDQASISCRSSQSLVHSNGNTYLHWYLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKISRVEAEDLGVYFCSQSTHVPYTFGGGTKLEIK"
}

igfold = IgFoldRunner()
emb = igfold.embed(
    sequences=sequences, # Antibody sequences
)

emb.bert_embs # Embeddings from AntiBERTy final hidden layer (dim: 1, L, 512)
emb.gt_embs # Embeddings after graph transformer layers (dim: 1, L, 64)
emb.strucutre_embs # Embeddings after template incorporation IPA (dim: 1, L, 64)

Synthetic antibody structures

To demonstrate the capabilities of IgFold for large-scale prediction of antibody structures, we applied the model to a non-redundant set of 104,994 paired antibody sequences from the Observed Antibody Space database. These predicted structures are made available for use online.

$ wget https://data.graylab.jhu.edu/igfold_oas_paired95.tar.gz

Bug reports

If you run into any problems while using IgFold, please create a Github issue with a description of the problem and the steps to reproduce it.

Citing this work

@article{ruffolo2021deciphering,
    title = {Deciphering antibody affinity maturation with language models and weakly supervised learning},
    author = {Ruffolo, Jeffrey A and Gray, Jeffrey J and Sulam, Jeremias},
    journal = {arXiv},
    year= {2021}
}
@article{ruffolo2022fast,
    title = {Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies},
    author = {Ruffolo, Jeffrey A and Chu, Lee-Shin and Mahajan, Sai Pooja and Gray, Jeffrey J},
    journal = {bioRxiv},
    year= {2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
igfold		igfold
.gitignore		.gitignore
LICENSE.md		LICENSE.md
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IgFold

Install

Usage

Antibody structure prediction from sequence

Predicted RMSD for antibody structures

Antibody sequence embedding

Synthetic antibody structures

Bug reports

Citing this work

About

Releases

Packages

Languages

License

LLei2/IgFold

Folders and files

Latest commit

History

Repository files navigation

IgFold

Install

Usage

Antibody structure prediction from sequence

Predicted RMSD for antibody structures

Antibody sequence embedding

Synthetic antibody structures

Bug reports

Citing this work

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages