LATEC is a benchmark for large-scale generation and evaluation of saliency maps across diverse computer vision modalities, datasets and model architectures. It contains the code for the paper "Navigating the Maze of Explainable AI: A Systematic Approach to Evaluating Methods and Metrics".
Explainable AI (XAI) is a rapidly growing domain with a myriad of methods as well as metrics aiming to evaluate their efficacy. However, current literature is often of limited scope, examining only a handful of XAI methods and employing one or a few metrics. Furthermore, pivotal factors for performance, such as the underlying architecture or the nature of input data, remain largely unexplored. This lack of comprehensive analysis hinders the ability to make generalized and robust conclusions about XAI performance, which is crucial for directing scientific progress but also for trustworthy real-world application of XAI. In response, we introduce LATEC, a large-scale benchmark that critically evaluates 17 prominent XAI methods using 20 distinct metrics. Our benchmark systematically incorporates vital elements like varied architectures and diverse input types, resulting in 7,560 examined combinations. Using this benchmark, we derive empirically grounded insights into areas of current debate, such as the impact of Transformer architectures and a comparative analysis of traditional attribution methods against novel attention mechanisms. To further solidify LATEC's position as a pivotal resource for future XAI research, all auxiliary data—from trained model weights to over 326k saliency maps and 378k metric scores—are made publicly available.
LATEC requires Python version 3.9 or later. All essential libraries for the execution of the code are installed when installing this repository:
git clone https://github.com/IML-DKFZ/latec
cd latec
pip install .
Depending on your GPU, you need to install an appropriate version of PyTorch and torchvision separately. All scripts run also on CPU, but can take substantially longer depending on the experiment. Testing and development were done with the Pytorch version using CUDA 11.6. Note that the packages Captum and Quantus are not the official versions but forks to adapt the XAI methods and metrics to 3D modalities and the benchmark.
├── configs - Hydra config files
│ ├── callbacks
│ ├── data
│ ├── eval_metric
│ ├── experiment
│ ├── explain_method
│ ├── extras
│ ├── hydra
│ ├── logger
│ └── paths
├── data - Data storage and ouput folders
│ ├── datasets - Datasets for all modalities
│ ├── evaluation - Evaluation scores as .npz
│ ├── saliency_mapss - Saliency maps output as .npz
│ ├── figures - Output of figures and gifs
│ └── model_weights - Model weights as .ckpt files
├── logs - Log files
├── notebooks - Notebooks for visualizations
├── scripts - Bash scripts for multi-runs
└── src
├── data - Datamodule scripts
├── main - Main experiment scripts
│ ├── main_eval.py - Runs evaluation pipeline
│ ├── main_explain.py - Runs explanation pipeline
│ └── main_rank.py - Runs ranking pipeline
├── modules
│ ├── components - Various submodules
│ ├── registry - Object registries for methods
│ ├── eval_methods.py - Loads evaluation metrics
│ ├── models.py - Loads deep learning models
│ └── xai_methods.py - Loads XAI methods
└── utils - Various utility scripts
If you want to reproduce only certain results or use our provided model weights, saliency maps, or evaluation scores for your own experiments, please download them here:
If you would like to reproduce specific results or utilize our provided model weights, saliency maps, or evaluation scores for your own experiments, please follow the instructions below:
-
Model Weights: Download and unzip the files into the
./data/
directory. -
Saliency Maps (Per Dataset): Download, move them to the respective modality folder, and unzip them at
./data/*modality*/
. -
Evaluation Scores: Download and unzip the files into the
./data/
directory.
For the CoMA and RESISC45 datasets, please download them directly from their respective websites. All other datasets will be automatically downloaded into the ./data/datasets/
folder when running the experiment for the first time.
To generate saliency maps, select the appropriate .yaml
configuration file for the dataset from ./config/data/
and the modality for the XAI method configuration from ./config/explain_method/
. Then, run the following command, specifying both configurations:
latec-explain data=vesselmnist3d.yaml explain_method=volume.yaml
For score computation, in addition to specifying data
and explain_method
, define the evaluation method configuration from ./config/eval_metric/
and provide the .npz
file containing the saliency maps (located at ./data/saliency_maps/*modality*/
). Run the following command with all the required configurations:
latec-eval data=vesselmnist3d.yaml explain_method=volume.yaml eval_metric=volume_vessel.yaml attr_path='saliency_maps_vesselmnist3d.npz'
To generate ranking tables, run the following command. Ensure that the paths in ./config/rank.yaml
point to the correct evaluation score .npz
files and that the appropriate ranking schema is selected:
latec-rank
To run all three steps in sequence, use the provided bash script ./scripts/run_all_steps.sh
, ensuring that the respective configuration files are correctly filled out. Please note that this process can be time-consuming, even with GPU resources.
- Add your dataset to the
./data/datasets/
folder and place your model weights as a.ckpt
file in the./data/model_weights/
folder. - Add a LightningDataModule file for your dataset to
./src/data/
and a corresponding config.yaml file to./config/data/
. Ensure the YAML file includes the*_target_*
specification. - Initialize the model and load the weights in the ModelsModule.init function (from
./src/modules/models.py
) for the appropriate modality, and append the model to theself.models
list. - Add the necessary layer for CAM methods and the Relational Representation Stability metric to both functions in
./src/utils/hidden_layer_selection.py
.
- Add the XAI method parameters to the relevant config file located at
./config/explain_methods/*modality*.yaml
. - Add the method to the method registry as a
config()
function similar to other methods in./src/modules/registry/xai_methods_registry.py
. - Ensure that your XAI method object contains a
.attribute(input, target, **hparams)
method that takes an observation, target, and parameters as input, and returns the saliency map as a NumPy array or PyTorch tensor.
- Add the evaluation metric parameters to the relevant config file located at
./config/eval_methods/*dataset*.yaml
. - Add the metric to the metric registry as a
config()
function similar to other metrics in./src/modules/registry/eval_metrics_registry.py
. - Ensure that your metric's
__call__(x_batch, y_batch, a_batch, device, **kwargs)
function accepts the observation batches (x_batch
), targets (y_batch
), saliency maps (a_batch
), and device as inputs, and outputs the scores as a NumPy array. These scores will be appended toeval_scores
. Depending on the experiment, acustom_batch
of data and the XAI method might be applied as well. If your metric requires them, include them in the**kwargs
input.
Bibtex:
@misc{klein2024navigating,
title={Navigating the Maze of Explainable AI: A Systematic Approach to Evaluating Methods and Metrics},
author={Lukas Klein and Carsten T. Lüth and Udo Schlegel and Till J. Bungert and Mennatallah El-Assady and Paul F. Jäger},
year={2024},
eprint={2409.16756},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2409.16756},
}
The code is developed by the authors of the paper. However, it does also contain pieces of code from the following packages:
- Pytorch EfficientNet 3D by Shi, Jian: https://github.com/shijianjian/EfficientNet-PyTorch-3D
- Pytorch Point Cloud Transformer by Guo, Meng-Hao et al.: https://github.com/Strawberry-Eat-Mango/PCT_Pytorch
- Pytorch Transformer-Explainability by Chefer, Hila et al.: https://github.com/hila-chefer/Transformer-Explainability
- Image Classification by Ziegler, Sebastian: https://github.com/MIC-DKFZ/image_classification
LATEC is developed and maintained by the Interactive Machine Learning Group of Helmholtz Imaging and the DKFZ, as well as the Institute for Machine Learning of the ETH Zürich.