Identifying Chirality in Line Drawings of Molecules Using Imbalanced Dataset Sampler for a Multilabel Classification Task

Yong En Kok, Simon Woodward, Ender Özcan and Mercedes Torres Torres

The paper has been accepted for publication: https://doi.org/10.1002/minf.202200068

Chirality is the ability of molecules to exist as two forms of non-superimposable mirror images. If the two forms cannot be superimposed on each other through any combination of transaltaion, rotations and conformational (bond rotation) changes, the molecules are achiral. There are four common structural motifs that lead to the identification of molecular chirality, namely centre/point, axial, planar and helical chirality.

Chemists have used line drawings to represent chiral organic molecules for more than 150 years, but machine readable representations were only developed much later: SMILES (in 1980s) and InChI (from 2000). Nonetheless, these molecular languages are not sufficient to fully define the molecular chirality as they are presently unable to represent axial, planar and helical chirality. Additionally, the process of reconstructing the 2D line drawings into machine readable formats are susceptible to the loss of stereochemical information, thus limiting chiral recognition.

Herein, we compared the pretrained EfficientNetV2 and ResNet50 networks that were fine-tuned for a binary task of chirality classification (achiral/chiral)and a multilabel task of chirality type classification (none/centre/axial/planar).

To address the label combination imbalanced problem in the multilabel task, the study proposed a new data sampling method–Formulated Imbalanced Dataset Sampler (FIDS) to sample a formulated amount of minority label combinations on top of the training set.

The research also demonstrated the potential of a deep learning network to make predictions that are align with human understanding of chirality through the study of heatmaps.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
dataset		dataset
model		model
utils		utils
README.md		README.md
inferenceBinary.py		inferenceBinary.py
inferenceBinary.yaml		inferenceBinary.yaml
inferenceMulti.py		inferenceMulti.py
inferenceMulti.yaml		inferenceMulti.yaml
moleculeChirality.png		moleculeChirality.png
parser.py		parser.py
requirements.txt		requirements.txt
trainBinary.py		trainBinary.py
trainBinary.yaml		trainBinary.yaml
trainMulti.py		trainMulti.py
trainMulti.yaml		trainMulti.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Identifying Chirality in Line Drawings of Molecules Using Imbalanced Dataset Sampler for a Multilabel Classification Task

Table of contents

Installation

timm/data/

timm/utils/

CHIRAL Dataset

Pretrained Models and the ChEMBL+ dataset

Usage

Cross validation

Infer for cross validation folds (for more detail performance evaluation)

Useful links

About

Releases

Packages

Languages

janetkok/Chiral

Folders and files

Latest commit

History

Repository files navigation

Identifying Chirality in Line Drawings of Molecules Using Imbalanced Dataset Sampler for a Multilabel Classification Task

Table of contents

Installation

timm/data/

timm/utils/

CHIRAL Dataset

Pretrained Models and the ChEMBL+ dataset

Usage

Cross validation

Infer for cross validation folds (for more detail performance evaluation)

Useful links

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages