PDBBind-Opt Workflow

This repository contains scripts of PDBBind-Opt workflow, which organizes a bunch of open-source softwares to probe and fix structural problems in PDBBind.

Code availability

pre_process/: Scripts to prepare PDBBind and BioLiP dataset (identifying ligands and extract binding affinity data)
workflow/: Codes for PDBBind-Opt worflow
- dimorphite_dl: Package to assign protonation states. We modified the site_substructures.smarts to make the rules easier.
- fix_ligand.py: LigandFixer module
- fix_protein.py: ProteinFixer module
- process.py: Main workflow
- rcsb.py: Functions to query RCSB (i.e. downloading files, query SMILES strings)
- gather.py: Functions to create metadata csv files
- fix_polymer.py: Functions to fix polymer ligands
- maual_smiles.json: Manually corrected reference SMILES
- building_blocks.csv: SMILES of alpha-amino acids and common N/C terminal caps. Used to create reference SMILES for polymers
error_fix/: Contains some error analysis
figshare/: Metadata of BioLiP2-Opt and PDBBind-Opt dumped in Figshare repo.

Dataset availability

PDBBind-Opt and BioLiP2-Opt datasets prepared by PDBBind-Opt workflow can be found in this Figshare repoistory.

How to reconstruct PDBBind-Opt and BioLiP-Opt

Step 1: Download PDBBind index file from their official website. Run download.sh in the pre_process to download BioLiP2 dataset
Step 2: Run pre_process/create_dataset_csv.ipynb to extract binding affinity and identifying ligands. This will give the three csv files
Step 3: Go to the workflow and use the following command to run the workflow

mkdir ../raw_data
python procees.py -i ../pre_process/BioLiP_bind_sm.csv -d ../raw_data/biolip2_opt
python procees.py -i ../pre_process/PDBBind_poly.csv -d ../raw_data/pdbbind_opt_poly --poly
python procees.py -i ../pre_process/PDBBind_sm.csv -d ../raw_data/pdbbind_opt_sm

This will take about one day on a 256-core CPU. If you have more nodes, considering split the input csv file to several chunks and run them in parallel. When the workflow finish, in the output directory, each PDBID will have a folder and if the workflow succeed on this PDBID, there will be a file named done.tag under its folder, otherwise ther will be a file named err.

Step 4: Run the gather.py to create metadata files, for example:

python gather.py -i ../pre_process/BioLiP_bind_sm.csv -d ../raw_data/biolip2_opt -o ../figshare/biolip2_opt/biolip2_opt.csv

Requirements

After conda create -n PDBBindOPTenv, most of packages can be directly installed using pip install, such as pip install gemmi,pip install rdkit-pypi, pip install openmm. In my experience (HPC, Linux, Python==3.11.9 environment), some packages are not easily installed using conda install conda-forge for new people in this area, and they are openmmforcefields, openff, pdbfixer and openbabel.

I recommend mamba (mamba, not mamda).

Install Miniforge

# in my case, I install
wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh"
bash Miniforge3-Linux-x86_64.sh

Navigate to ${HOME} root, you will see new miniforge3 folder alongside your miniconda3 folder. In ${HOME}/miniforge3/etc/profile.d/, you will see conda.sh and mamba.sh, source them
```
source /${HOME}/miniforge3/etc/profile.d/conda.sh
source /${HOME}/miniforge3/etc/profile.d/mamba.sh
```

At this moment, if we check conda env list,we will see

# conda environments:
#
                       /${HOME}/miniconda3
                       /${HOME}/miniconda3/envs/PDBBindOPTenv
base                   /${HOME}/miniforge3

conda activate /${HOME}/miniconda3/envs/PDBBindOPTenv
mamba install -c conda-forge openmmforcefields
mamba install -c conda-forge openff-toolkit
mamba install -c conda-forge pdbfixer
mamba install -c conda-forge openbabel

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
analysis		analysis
assets		assets
error_fix		error_fix
figshare		figshare
pre_process		pre_process
tests		tests
workflow		workflow
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDBBind-Opt Workflow

Code availability

Dataset availability

How to reconstruct PDBBind-Opt and BioLiP-Opt

Requirements

About

Releases

Packages

Contributors 3

Languages

THGLab/PDBBind-Opt

Folders and files

Latest commit

History

Repository files navigation

PDBBind-Opt Workflow

Code availability

Dataset availability

How to reconstruct PDBBind-Opt and BioLiP-Opt

Requirements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages