RACER

Rapid Coarse-grained Epitope TCR Model

An implementation of the RACER model for TCR recognition

Installation

Clone the RACER repository

git clone https://github.com/XingchengLin/RACER.git

Download and install python using conda: https://www.anaconda.com/products/individual
Download and install Modeller using conda: https://anaconda.org/salilab/modeller
Note: One needs to obtain a Modeller academic license in order to run Modeller: https://salilab.org/modeller/registration.html
Download and install Biopython using conda: https://biopython.org/wiki/Packages
Note: the installation time should be rapid

Raw data

The extended dataset from (Birnbaum et al Cell 2014) used for our training

Molecular Demo

System requirement:

Note: The code showing here is built for a Linux system. For a Mac user, please use the one line command:

cmd_4mac.sh

The code was tested with Python 3.6.7

Example

Optimize an energy model given available PDB of 3QIB.pdb : https://www.rcsb.org/structure/3QIB
In this example, we will generate 1000 decoy sequences for optimization, and calculate the binding energies of native sequence (strong binders) of 3QIB.pdb and testing sequences (weak binders) generated by Lanzarotti et. al. DOI: 10.1016/j.molimm.2017.12.019
Step 1 Clean the PDB files
Clean the PDB files so that they contain only one TCR-p-MHC pair with defined Chain IDs for TCR as well as p-MHC. An example is given in the folder data/. The native.pdb is cleaned from 3QIB.pdb. The testBinders were built using Modeller based on the template of native.pdb, with replaced peptide sequences.
Step 2 Optimize the energy model
This step includes 1. Processing the PDB files so that they follow the format of the AWSEM model (DOI: 10.1021/jp212541y) developed by Wolynes group; 2. Generate 1000 decoy peptide sequences from the strong binder; 3. Evaluate a "Phi" map between the TCR and the peptide, based on their contact probability; 4. Optimize a force field by maximizing the Z score between the binding energies of strong and decoy sequences

bash cmd.preprocessing.sh 3qib C D 782 794
bash cmd.optimize.sh

Note: "C" and "D" are the chain IDs of TCR alpha and beta chains. 782 and 794 are the starting and ending residue IDs of the presented peptide.
Note: For now, one needs to hard-set the cutoff for noise-filtering of the eigenvalues of the interaction matrix. Please make sure the file: ./gammas/randomized_decoy/01022019/direct_contact/proteins_list_phi_pairwise_contact_well4.5_6.5_5.0_10_lamb_filtered after running optimization has no zero terms (eigenvalues). Here, the cutoff was set as 50 in the file cmd.optimize.sh, to keep the first 50 eigenvectors of the B matrix in Eq. (5) of the paper. This choice is made for consistency with the Supplementary Note S5 of the manuscript.
Step 3 Use the optimized energy model to evaluate the effective binding energies of strong/weak binders
This step uses the optimized energy model (a 20 by 20 matrix for different amino acid types) to evaluate the binding energies of the strong (native) and weak (testBinder) binders. A lower binding energy corresponds to a stronger binding affinity

bash cmd.evaluate_bindingE.sh 3qib C D

All the codes in this demo can be executed by one command in the folder molecular_demo (one check button was added to remind users that they need to check the chain IDs of their rebuilt testing structures):

bash cmd.sh

Expected run time on a personal computer: around 1 minute for our given example data (note it takes longer for running the much larger set of data used in the referenced RACER manuscript)

The final results (binding energies) are reported in the folder evaluated_binding_E/

Explanation for the output:
epitopeE.txt -- Binding energies of the strong binder
non-epitopeE.txt -- Binding energies of the weak binders
Evaluated_bindingE.png -- One plot showing the predicted range of binding energies for the weak binders, as well as the strong binders

Statistical Demo

Note: The output from this demo is proper subset of (and so different from) the larger data analyzed in the referenced RACER manuscript. In particular, we focused on 1000 of the original 10^5 T cells and truncate the thymic selection to be performed on 100 of the original 10^4 self-peptides.

System requirement

The code uses MATLab script and is compiled on version R2017b.

~ ~ ~ ~ ~ ~ ~ ~

DESCRIPTION for the files:

RACERMATLabScript.m: Script file which provides an example of thymic selection and T-cell recognition of foreign peptides and point-mutated self-peptides.
PairwiseAffinity.mat: MATLab data file containing pairwise binding energy values for 100 thymic self-peptides and 1000 T-cells (peptides delineated by column and T-cells by row)
PairwiseAffinityMutant.mat: MATLab data file containing pairwise binding energy values for 1000 point-mutated (non-self) peptides and 1000 T-cells (peptides delineated by column and T-cells by row). The binding energy came from the output of the molecular module of RACER.
PairwiseAffinityRandom.mat: MATLab data file containing pairwise binding energy values for 1000 randomly-generated foreign (non-self) peptides and 1000 T-cells (peptides delineated by column and T-cells by row). The binding energy came from the output of the molecular module of RACER.
RACERMATLab.m: Function file which generates the T-cell activation energy cutoff on the normalized affinity interval of [0,10] yielding 50% thymic negative selection (variable En50); outputs plots of the distribution of binding energies, thymic selection, and post-selection T-cell recognition profiles of point mutant and random peptides.

Example

Run RACERMATLabScript.m

NOTES:

The T-cells and their corresponding indices are the same across all three input arrays. In other words, binding energies for the j^th T-cell to self-peptides, mutant peptide, and foreign peptide are located at the j^th column of PairwiseAffinity.mat, PairwiseAffinityMutant.mat, and PairwiseAffinityRandom.mat respectively.

Explanation for the output figures:
Figure 1a. Empirical maximum binding energy distributions of T-cells with their self-peptides (maximum for each T-cell taken over all self-peptides)
Figure 1b. Thymic selection curve (T-cell deletion probability as a function of thymic selection energy cutoff.
Figure 2a. Post-selection individual T-cell recognition of foreign peptides as a function of T-cell survival probability
Figure 2b. Post-selection T-cell repertoire recognition of foreign peptides as a function of T-cell survival probability
Figure 2c. Post-selection individual T-cell recognition of mutant peptides as a function of T-cell survival probability
Figure 2d. Post-selection T-cell repertoire recognition of mutant peptides as a function of T-cell survival probability
Expected run time on a personal computer: around 1 minute for our given example data (note it takes longer for running the much larger set of data used in the referenced RACER manuscript)

Reference:

Xingcheng Lin, Jason T. George, Nicholas P. Schafer, Kevin Ng Chau, Cecilia Clementi, José N. Onuchic, Herbert Levine, Rapid assessment of T-cell receptor specificity of the immune repertoire. Nat Comput Sci 1, 362–373 (2021). https://doi.org/10.1038/s43588-021-00076-1
RACER borrows some ideas from the Principle of Minimal Frustration in protein folding, illustrated here: Nicholas P. Schafer, Bobby L. Kim, Weihua Zheng, Peter G. Wolynes, Learning To Fold Proteins Using Energy Landscape Theory, https://doi.org/10.1002/ijch.201300145

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
molecular_demo		molecular_demo
raw_data		raw_data
statistical_demo		statistical_demo
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RACER

An implementation of the RACER model for TCR recognition

Installation

Raw data

Molecular Demo

System requirement:

Example

Statistical Demo

System requirement

Example

Reference:

About

Releases 1

Packages

Languages

License

XingchengLin/RACER

Folders and files

Latest commit

History

Repository files navigation

RACER

An implementation of the RACER model for TCR recognition

Installation

Raw data

Molecular Demo

System requirement:

Example

Statistical Demo

System requirement

Example

Reference:

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages