We explore the reliability of Convolutional Neural Networks (CNNs) in the identification of important regions for binding, and the significance of the deep representations by providing explanations to the model’s decisions based on the identification of the input regions that contributed the most to the prediction. Furthermore, we implement an end-to-end deep learning architecture to predict binding affinity, where CNNs are exploited in their capacity to automatically surmise and extract discriminating deep representations from 1D sequential and structural data.
End-to-End Deep Learning Architecture: Convolutional Neural Networks + Feed-Forward Fully Connected Neural Network
-
Potential Binding Sites (≤ 5 Å) : Green
-
L-Grad-RAM Hits : Blue
-
Matched Binding - L-Grad-RAM Hits : Red
- Two Parallel Convolution Neural Networks + Fully Connected Neural Network
- Global Max Pooling + Guided Gradients
- Global Max Pooling + Non Guided Gradients
- Global Average Pooling + Guided Gradients
- Global Average Pooling + Non Guided Gradients
- davis_original_dataset: original dataset
- davis_dataset_processed: dataset processed : prot sequences + rdkit SMILES strings + pkd values
- deep_features_dataset: CNN deep representations: protein + SMILES deep representations
- test_cluster: independent test set indices
- train_cluster_X: train indices
- protein_sw_score: protein Smith-Waterman similarity scores
- protein_sw_score_norm: protein Smith-Waterman similarity normalized scores
- smiles_ecfp6_tanimoto_sim: SMILES Morgan radius 3 similarity scores
- davis_scpdb_binding: davis-scpdb matching pairs binding information
- pssm_X: davis-scpdb matching pairs PSSM
- scpdb_binding: scpdb pairs binding information
- pssm_X: scpdb pairs PSSM
- davis_prot_dictionary: AA char-integer dictionary
- davis_smiles_dictionary: SMILES char-integer dictionary
Davis Kinase Binding Affinity Dataset + Clusters in the SOTA method format
- abl1_pymol.pse: ABL1(E255K)-phosphorylated - SKI-606 PyMol Session
- ddr1_pymol.pse: DDR1 - Foretinib PyMol Session
- Python 3.7.9
- Tensorflow 2.4.1
- Numpy
- Pandas
- Scikit-learn
- Itertools
- Matplotlib
- Seaborn
- Glob
- Json
python cnn_fcnn_model.py --option Training --num_cnn_layers_prot 3 --prot_filters 64 64 128 --prot_filters_w 4 4 5 --num_cnn_layers_smiles 3 --smiles_filters 64 64 128 --smiles_filters_w 4 4 5 --num_fcnn_layers 3 --fcnn_units 1024 512 1024 --drop_rate 0.5 0.1 --lr_rate 0.0001
python cnn_fcnn_model.py --option Validation --num_cnn_layers_prot 3 --prot_filters 64 64 128 --prot_filters_w 4 4 5 --num_cnn_layers_smiles 3 --smiles_filters 64 64 128 --smiles_filters_w 4 4 5 --num_fcnn_layers 3 --fcnn_units 1024 512 1024 --drop_rate 0.5 0.1 --lr_rate 0.0001
python cnn_fcnn_model.py --option Evaluation
Example
- Protein Sequence : MLEICLKLVG...
- SMILES String : Cc1cn(...
- Window Length : 0 1 2 ...
- Feature Importance Threshold : 0.3 0.4 0.5 ...
- Binding Sites Positions : 5 10 50 ...
python gradram_testing.py --protein_sequence MLEICLKLVG... --smiles_string Cc1cn(... --window 0 1 2 ... --thresholds 0.3 0.4 0.5 ... --sites 5 10 50 ...