Skip to content

Latest commit

 

History

History
74 lines (59 loc) · 2.79 KB

MSMS_PRED.md

File metadata and controls

74 lines (59 loc) · 2.79 KB

MS/MS Prediction using 3DMolMS

Step 0: Clone this repository and setup the anaconda environment by the following commands:

git clone https://github.com/JosieHong/3DMolMS.git
cd 3DMolMS

pip install .

Step 1: Prepare the test set. The following formats are supported: csv, mgf, or customed pkl.

Here is an input example from MoNA of csv format (see the whole file at ./test/demo_input.csv):

ID,SMILES,Precursor_Type,Collision_Energy
demo_0,C/C(=C\CNc1nc[nH]c2ncnc1-2)CO,[M+H]+,40 V
demo_1,C/C(=C\CNc1nc[nH]c2ncnc1-2)CO,[M+H]+,20 V
demo_2,C/C(=C\CNc1nc[nH]c2ncnc1-2)CO,[M+H]+,10 V

Here is an input example from MoNA of mgf format (see the whole file at ./test/demo_input.mgf):

BEGIN IONS
TITLE=demo_0
CHARGE=1+
PRECURSOR_TYPE=[M+H]+
PRECURSOR_MZ=220.1193
MOLMASS=219.11201003600002
MS_LEVEL=MS2
IONMODE=P
SOURCE_INSTRUMENT=Agilent 6530 Q-TOF
INSTRUMENT_TYPE=LC-ESI-QTOF
COLLISION_ENERGY=40 V
SMILES=C/C(=C\CNc1nc[nH]c2ncnc1-2)CO
INCHI_KEY=UZKQTCBAMSWPJD-FARCUNLSSA-N
41.0399 6.207207
43.0192 49.711712
43.0766 1.986987
......

Please notice that the unsupported input will be filtered out automatically when loading the dataset. The supported inputs are shown in the following table.

Item Supported input
Atom number <=300
Atom types 'C', 'O', 'N', 'H', 'P', 'S', 'F', 'Cl', 'B', 'Br', 'I', 'Na'
Precursor types '[M+H]+', '[M-H]-', '[M+H-H2O]+', '[M+Na]+'
Collision energy any number

Step 2: Predict the MS/MS spectra using the following command:

python ./src/pred.py \
--test_data ./demo/demo_input.csv \
--model_config_path ./src/molnetpack/config/molnet.yml \
--data_config_path ./src/molnetpack/config/preprocess_etkdgv3.yml \
--resume_path ./check_point/molnet_qtof_etkdgv3.pt \
--result_path ./test/demo_output.mgf \
--save_img_dir ./img/

Arguments:

  • --resume_path is the path of model's checkpoint. In the first running, the checkpoint (./checkpoint/molnet_qtof_etkdgv3.pt) will be downloaded from Google Drive. You can also set the resume path as the path to your own model.
  • --result_path is the path to save the predicted MS/MS. It should end with .mgf or .csv. We recommend you use .mgf because mgf is a more common format for MS/MS.
  • --save_img_dir is an optional argument denoting the path to save the figures of predicted MS/MS. One of the plots is shown here: