Code repository to NeurIPS 2021 AI for Science Workshop
- Install required libraries with
pip install pykeen==1.5.0 wandb
Refer to link for more instructions on WANDB result tracker. You will be able to see various metrics during training and the evaluation after training.
- Download files into the corresponding directory under root
- Train and save trained baseline embedding model X (transe, distmult, proje, rotate, simple, tucker)
python baseline/save_model_X.py
- Example hyperparameter optimization
python baseline/hpo_search.py
- (optional) Scrape text from sources
The needed json files of scraped text are already in the directory. If you are interested in doing from scratch, first download the hetionet json dataset
python text_scrape.py -n X
where X must be one of the available entity type('Anatomy', 'Biological Process', 'Cellular Component', 'Compound', 'Disease', 'Gene'(local), 'Molecular Function', 'Pathway', 'Pharmacologic Class')
- Text Embedding We provide the texual embedding generated with BioBert V1.1 in the aforementioned polybox folder. If you are interested in generate this texual embedding yourself, please refer to
text_process/get_embedding_for_hetionet_drugs.py
You will need to install Hugging Face Transformers library.
-
Find your pykeen library installation path and replace corresponding files with the ones in
/pykeen-extension/
This step is to include the texual interaction etc. functionality. -
Train Text Augmented KG Run the model you desire to train by
python model_with_text/X.py
The file name should be self-explanatory. -
To calculate % of Disease @10 and Unique Entities @1: please refer to
evaluation/test_evaluate.py