Chapter 1: Introduction

Chemoinformatics is a methodology that is used to analyze mainly chemical-related data using a computer and solve various problems. The term chemoinformatics was defined in the late 1990s and early 2000s, and in the pharmaceutical industry and pharmaceutical academia, the relationship between drug effects and compound characteristics is analyzed, large amounts of compound information are visualized, and compound similarity It is used in a wide variety of processes, including gender-based clustering.

In recent years, drug discovery applications for deep learning have been explored, but not only in conventional chemoinformatics such as new design proposals and synthetic route proposals, as well as QSAR (Quantitative Structure-Activity Relationship) for predicting activity and physical properties. Applied research to areas that were not being conducted is also actively conducted.

Compound design is innovative

What kind of compound should we make in the first place? And how to synthesize it? The process of thinking about the background is an area where background knowledge and imagination are required, and conventionally it has been recognized that it is a difficult area for people other than to bear, but the advancement of what is also called AI to such areas is here It progressed rapidly in several years (2017-2019).

Cheminformatics has already been used in various situations, but there was not much relevant information. There are several possible reasons for this, but there is no doubt that the two main reasons are that there were no open source toolkits and no public databases. However, with the advent of RDKit, an open source chemoinformatics toolkit called RDKit and a public database called ChEMBL, this has been resolved.

In recent years, as with bioinformatics in chemoinformatics, a lot of information can be obtained immediately by searching on the web, and it is possible to learn by yourself, but as a set of information to take a first step, We decided to prepare "the content that could learn the basics of chemoinformatics and apply them". Considering the recent AI drug discovery boom, the latter chapter contains chapters on compound activity prediction and compound proposal using deep learning used in the context of “AI drug discovery”, so one-stop learning So you should be able to keep up with the recent trends.

What is RDKit

warning: Here is a subsection of @ iwatobipen’s talk about RDKit. At the draft stage, the words such as "I will say" or "based on" are used as they are, and the self-proclaimed is a "comprehensive" @ iwatobipen-style style of "gozuru" tone.

My name is @iwatobipen, who writes a part of this book. I’m going to talk hot about RDKit here.

What is the RD of RDKit? Actually, it is an abbreviation of Rational Discovery , and a framework that is the predecessor of the current open source was developed in 2000. It’s so old and old. Then, in 2006, the code became open source and was released from sourceforge. Readers who think that Python’s chemoinformatics toolkit includes OpenBabel besides RDKit will also be welcome. OpenBabel was first released in 2005. All come with a toolkit that has more than 10 years of history. I remember that OpenBabel was the major in around 2012, when the deaf people began to be interested in this area. At that time, there were almost no articles in Japanese, and the person who wrote this book was a trial and error writing the code of RDKit referring to the chemo info cookbook of @fmkz___ who is a co- author of this book and a pioneer in the industry Oh. If you want to keep track of chemoinfo related history, you should read this article.

Developer Greg Landorum says

RDKit is the Swiss Army Knife in chemoinformatics and is a collection of various functional pieces

— Greg Landorum

This is exactly the expression which got the target. As you can see if you look at the link:official document , it already has various features. Starting with reading and writing of compound information, drawing of structure, 3D structure conformation generation, R group decomposition, descriptor, fingerprint calculation, pharmacophore calculation etc. Oh. It can cover a wide range from analysis to visualization. Furthermore, the tools developed by Contributor and others using RDKit are packed in the Contrib folder along with their hot feelings . How do you want to use it? Now I want to write code with RDKit as soon as possible, I cant’t wait ;)

Note	@iwatobipen is, of course, one of the contributors, and provides code to quickly cluster a large number of compound libraries called Fastcluster . (by @fmkz___)

RDKit is also active in the development and user community, with more features being added. The style in which talented researchers from all over the world build up and develop as a whole is the strength and attraction of open source. If you have a chance, consider joining the annual RDKit User Group Meeting. It is hard to replace anything with Face2Face that users can discuss each other. In addition, I said that there was almost no information on Japanese at the time when the deaf began to use it, but in recent years there have been a lot of very good Japanese articles. Here are a few examples: There are many articles posted on Qiita.

In addition, RDKit-users-jp by volunteers has also been launched. If your question in English seems to be a bit …, I would like to ask a question here. Also, Japanese documents are merged into the latest version of RDKit’s repository. This will also be helpful. This document only uses some of RDKit’s features. You should still feel that you can do a lot of things. Once you have taken the first step of interest, you should go ahead with your own interest and motivation. If you do not understand something, ask the above community and post it to the repository of this book as an issue. Well then let’s get started!

Main Japanese Commentary Site

rdkit-users.jp
RDKitドキュメンテーション非公式日本語版サイト:Unofficail site of rdkit documentation
化学の新しいカタチ:The shape of new chemistry

Target audience

The following people are assumed as readers.

Postdoctoral student who wants to do data analysis of graduate students in pharmacy and medicine and pharmacy
Pharmacist at a pharmaceutical company who wants to analyze his own data
Those who feel the need for chemoinformatics in drug discovery chemists and those who are assigned suddenly due to the power of mystery
Bioinformaticians who are thinking of learning chemoinformatics
People who are interested in AI drug discovery but do not know what to start with

About the code of this book

All of the programming code used in this book is located in the notebooks directory of the py4cheminformatics repository of Mishima.syk. The first one of each of the chapter please see properly because it stretched a link to the chapter of Jupyter notebook to.

The installation of Chapter 2 will enable you to use git commands, so you can download all the data in this manual including pdf with the following command

$ git clone https://github.com/Mishima-syk/py4chemoinformatics.git

bonus

Chemoinformatics or Cheminformatics?

Chemoinformatics or Cheminformatics? Originally I remember that Bio and the combination of the word “Chemo” appeared, but it was widely separated from Chem for a while by the launch of the Journal of Cheminformatics.

According to the recent Google trend, it seems either way, but personally I think that it is better to put emphasis on Rhyme, so I will use Chemo in this book.

Acknowledgment

We would like to thank the following people for their bug fixes and suggestions for improvement when writing this document:

@antiplastics, @bonohu, @ReLuTropy, @ski_nanko, @torusengoku, @yamasaKit_ @4Elemento, @4Elemento, thanks a lot for tranlation task!!!! (from @iwatobipen)

From here onwards I wrote while listening to Nujabes-reflection eternal by @fmkz___ 20/03/20

First of all, I would like to thank the @bonohu which triggered me to write this book. @Bonohu’s Dr. Bono’s analysis of life science data. At athe meeting of Mishima.syk we talked that "The Bono book Chemoinformatics version" would be nice. There is no doubt that what triggered me to write this book is, "Well, if yes, why not write?" Also, link: @souyakuchan Drug Advent Calendar 2018, written in Japanese has also become a good stimulus for writing. In other words, I think that I did not start to move specifically if I did not make a chapter here.

Also, it is the existence of y-sama that should not be forgotten. Mishima.syk y-sama has been away at the beginning and has fallen forever on 2019/01/06. He wrote wonderful post such as Python environment construction of the person who aims at the data scientist 2016 and Small talk about drug likeness: written in Japanese. If he was alive, we would probably write by three people and the content would have been more complete. This event also gave us a strong motivation to write.

Finally, I would like to thank the participants who participated in Mishima.syk for drinking good wine and beer and having a hot discussion every time. Some content is based on the presentation at Mishima.syk, and has been revised based on your feedback.

If you have read this book, and if you feel that chemoinformatics is interesting or you want to do drug discovery, please join Mishima.syk. I think it will be fun. In future drug discovery research, it will be important to push each other across affiliations and improve their skills. In fact, I think it is already such a society. I hope this book will help you have a pleasant research life.

I do what I want to do I live myself, I have no regrets in my life. Life enjoys winning. I think it would be fun to enjoy your life by chasing your joy to the fullest by saying that you hate something you hate. I wish you all the best in your life.

— y__sama

License

This document is Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ch01_introduction.asciidoc

ch01_introduction.asciidoc

Chapter 1: Introduction

What is RDKit

Main Japanese Commentary Site

Target audience

About the code of this book

bonus

Acknowledgment

License

Files

ch01_introduction.asciidoc

Latest commit

History

ch01_introduction.asciidoc

File metadata and controls

Chapter 1: Introduction

What is RDKit

Main Japanese Commentary Site

Target audience

About the code of this book

bonus

Acknowledgment

License