Getting an understanding of drug discovery using machine learning and getting experience using Chemprop. Chemprop is a tool which allows for modeling of molecules using message passing graph neural networks.
This is done as the final project in the course ID2223 Scalable Machine Learning and Deep Learning at KTH.
The datasets used in this project are available in the data
folder. We used article_training.csv
for training our model and assessing it. And then used the obtained model in order to predict novel antibiotic candidates using the coco_test.csv
dataset.
Our training dataset comes from the article which we used as inspiration for this project: A Deep Learning Approach to Antibiotic Discovery.
Our dataset used for predictions consists of SMILES strings collected from COCONUT, a database of natural compounds, by filtering for compounds with 0-5 carbons. Our conversion script sdf_to_csv.py
is then used in order to convert the COCONUT data from SDF to CSV.
At first, we tried to make a notebook in Google Colab, then DeepNote and Databricks:, but we faced severe issues while trying to install and/or use Miniconda on them. We should have tried to make a minimum viable product first, locally. In the end, we decided to do the project locally, using the terminal and Jupyter Notebooks.
The project report, along with a list of our antibiotic candidates, is available in PDF format in this repository.
The commands that we ran in order to obtain our results can be found in the commands.md
file.
Please have a look at these pages in case you don't have a working conda installation already.
https://docs.conda.io/en/latest/miniconda.html
https://conda.io/projects/conda/en/latest/user-guide/install/index.html
Make sure to also have a look at the most recent Chemprop installation instructions, just in case the instructions here have become out-dated. We are going to install it from PyPi.
Create a new environment for chemprop and activate it.
conda create -n chemprop python=3.8
conda activate chemprop
Make sure that the current pip is from the current environment.
which pip
Install Chemprop and other necessary packages.
conda install -c conda-forge rdkit
pip install git+https://github.com/bp-kelley/descriptastorus
pip install chemprop
You should now be able to use Chemprop's CLI, with commands
starting with chemprop_*
. (need to reboot the terminal? or
will they be available right away?)
From here, you should be able to start a Jupyter notebook session,
and start using Chemprop. In our case, however, we've had to also
install jupyter notebook within the current conda environment, so
that the chemprop
python module would get discovered inside the
notebook.
For solving the problem with import chemprop
within Jupyter notebooks,
the following steps will help.
conda install ipykernel notebook
conda install -c conda-forge ipywidgets
jupyter notebook .
Test your installation by entering import chemprop
.
Chemprop will show some warnings the first time it gets imported, but we were unable to find any solutions to them, and they seemed to be okay to ignore.
jupyter notebook list
jupyter notebook stop <port number>
conda deactivate