This repository consists of analysis scripts to reproduce the publication on LOBSTER database
The following version numbers are used for the workflows:
Workflow.ipynb
includes script to start all LOBSTER computations with
pymatgen, fireworks, and atomate.
-
Use the
Data_generation/requirements.txt
to create conda environment with necessary packages -
The
Lobster_lightweight_json_generation.ipynb
script will generate light weight lobster jsons that consists of lobsterpy summarzied bonding information, relevant strongest bonds, madelung energies of the structures and atomic charges (refer Table 1 and 2 of the manuscript for the description). -
The
Computational_data_generation.ipynb
script stores all the relevant LOBSTER computation files in the form of JSON using pydantic schema as implemented for atomate2 (refer Table 3 of the manuscript for the description).Example_data/Lightweight_jsons/
-- path to sample LOBSTER Lightweight JSONS filesExample_data/Computational_data_jsons/
-- path to sample Computational JSON files- All 1520 LOBSTER Lightweight JSONS / Computational data JSONS can be download here :
- Use the
Read_data_records/requirements.txt
to create conda environment with necessary packages Read_lobsterpy_data.ipynb
This script will read LobsterPy summarized bonding information JSON files as python dictionary (refer Table 1 of the manuscript for the description).Read_lobsterschema_data.ipynb
This script will read LobsterSchema data as pymatgen objects and consists of all the relevant LOBSTER computation data in the form of python dictionary (refer Table 2 of the manuscript for the description).
-
atomate2 - Install it using
pip install git+https://github.com/materialsproject/atomate2.git@fa603e3cb4c3024b9b12b0d752793a9191d99f8a
-
- Download all the computational data files from following repository links:
- Part 1, Part 2, Part 3, Part 4, Part 5, Part 6, Part 7, Part 8
- Make a directory named
Results
to extract all the tar files. - For example , extract
Part1.tar file
, using following commandtar -xf Part1.tar -C ./Results/
- Repeat the command above to extract all the 8 tar files to
Results
directory. - This should result in 1520 directories inside
Results
. Each sub directory will be named asmp-xxx
and it denotes the Materials Project ID of the compound.
- You can then use the scripts provided to reproduce our technical validation section results.
Charge_spilling_lobster.ipynb
will produce the dataframe with charge spillings for entire dataset and also create the histograms (as in the manuscript).Charge_spilling_data.pkl
consists of presaved data fromCharge_spilling_lobster.ipynb
script run (load this to get plots on the go).
Band_overlaps.ipynb
will produce the dataframe with devaitons frombandOverlaps.lobster
for entire dataset.Band_overlaps_data.pkl
consists of presaved data fromBand_overlaps.ipynb
script run (load this to get results on the go).
Get_plots_band_features_tanimoto.ipynb
will produce all the PDOS benchmarking data, save pandas dataframes as pickle and also save the all the plotslsolobdos.pkl
andlobdos.pkl
consists of all the data necessary to reproduce the plots (as shown in Fig 4, 5, 6, 7)Save_pdos_plot_and_data.ipynb
will save the PDOS comparison plots.-
- Download the dash app and its data from 10.5281/zenodo.7795903
- Run the
Band_features.py
script to get dash app to explore all the s, p, d band feature plots (Checkout -h options) - Run the
Check_fingerprints.py
script to get dash app to visualize all the s, p, d fingerprint plots (Checkout -h options)s
BVA_Charge_comparisons.ipynb
will produce the results of charge comparison analysis and also corresponding plots (as shown in Fig 8, 9)Charge_comp_data.pkl
contains saved to charge comparisonCoordination_comparisons_BVA.ipynb
will produce the results of coordination environments comparisonsCoordination_comp_data_bva.pkl
contains saved to coordination environments comparisons
Data_topology.ipynb
this script will extract and store the data necessary for Fig 10.Lobster_dataoverview.pkl
contains presaved data ready to be used for generating Fig 10.
- Create conda environment with python 3.8 use
conda create -n ML_model python==3.8
- Activate the newly created
ML_model
environment and install matbench v0.6 usingpip install matbench==0.6
(Need to do this to deal with automatminer package dependencies conflicts) - Then use the
ML_model/requirements.txt
to install all the necessary packages mpids.csv
File contains list of material project ids and corresponding compositionsfeaturizer
This python module is used to featurize lobster lightweight jsons to use ICOHP data as features for ML modelFeaturize_lobsterpy_jsons.ipynb
This script will generate lobster features via featurizer module save it as using the featurizer modulelobsterpy_featurized_data.csv
ML_data_with_automatminer.ipynb
This script uses automatminer featurizer to extract matminer features based on composition and structure and creates data ready to be used for ML model training (also adds lobter summary stats data as features)-dataforml_automatminer.pkl
ml_utilities.py
This module contains utility functions used for training and evaluating random forest (RF) regressor models.RF_model.ipynb
This script will train and evaluate 2 RF regressor models using nested CV approach. (Including and exclusing LOBSTER features)Automatminer_rf_ml_model.ipynb
This script will train and evaluate RF regression models using automatminer Matpipe (Used to compare matbench RF model).exc_icohp
This directory containts model cross validation evaluation result plot and feature importance plotsexc_icohp/summary_stats.csv
This file containts summarized stats of model trained and evaluated usingRF_model.ipynb
script. (Excluding LOBSTER features)inc_icohp
This directory containts model cross validation evaluation result plot and feature importance plotsinc_icohp/summary_stats.csv
This file containts summarized stats of model trained and evaluated usingRF_model.ipynb
script. (Including LOBSTER features)Plot_summary_results.ipynb
This scripts reads thesummary_stats.csv
of the RF model and visualizes data from Table 7.
- Download all the computational data files from following repository links: