Step 0: Clone this repository and setup the anaconda environment by the following commands:
git clone https://github.com/JosieHong/3DMolMS.git
cd 3DMolMS
pip install .
Step 1: Prepare the test set. The following formats are supported: csv, mgf, or customed pkl.
Here is an input example from MoNA of csv format (see the whole file at ./test/demo_input.csv
):
ID,SMILES,Precursor_Type,Collision_Energy
demo_0,C/C(=C\CNc1nc[nH]c2ncnc1-2)CO,[M+H]+,40 V
demo_1,C/C(=C\CNc1nc[nH]c2ncnc1-2)CO,[M+H]+,20 V
demo_2,C/C(=C\CNc1nc[nH]c2ncnc1-2)CO,[M+H]+,10 V
Here is an input example from MoNA of mgf format (see the whole file at ./test/demo_input.mgf
):
BEGIN IONS
TITLE=demo_0
CHARGE=1+
PRECURSOR_TYPE=[M+H]+
PRECURSOR_MZ=220.1193
MOLMASS=219.11201003600002
MS_LEVEL=MS2
IONMODE=P
SOURCE_INSTRUMENT=Agilent 6530 Q-TOF
INSTRUMENT_TYPE=LC-ESI-QTOF
COLLISION_ENERGY=40 V
SMILES=C/C(=C\CNc1nc[nH]c2ncnc1-2)CO
INCHI_KEY=UZKQTCBAMSWPJD-FARCUNLSSA-N
41.0399 6.207207
43.0192 49.711712
43.0766 1.986987
......
Please notice that the unsupported input will be filtered out automatically when loading the dataset. The supported inputs are shown in the following table.
Item | Supported input |
---|---|
Atom number | <=300 |
Atom types | 'C', 'O', 'N', 'H', 'P', 'S', 'F', 'Cl', 'B', 'Br', 'I', 'Na' |
Precursor types | '[M+H]+', '[M-H]-', '[M+H-H2O]+', '[M+Na]+' |
Collision energy | any number |
Step 2: Predict the MS/MS spectra using the following command:
python ./src/pred.py \
--test_data ./demo/demo_input.csv \
--model_config_path ./src/molnetpack/config/molnet.yml \
--data_config_path ./src/molnetpack/config/preprocess_etkdgv3.yml \
--resume_path ./check_point/molnet_qtof_etkdgv3.pt \
--result_path ./test/demo_output.mgf \
--save_img_dir ./img/
Arguments:
--resume_path
is the path of model's checkpoint. In the first running, the checkpoint (./checkpoint/molnet_qtof_etkdgv3.pt
) will be downloaded from Google Drive. You can also set the resume path as the path to your own model.--result_path
is the path to save the predicted MS/MS. It should end with.mgf
or.csv
. We recommend you use.mgf
because mgf is a more common format for MS/MS.--save_img_dir
is an optional argument denoting the path to save the figures of predicted MS/MS. One of the plots is shown here: