Verifiable Boosted Tree Ensembles

This repository contains the implementation of the training algorithm for large-spread boosted tree ensembles, based on LightGBM, and the efficient robustness verification algorithm CARVE-GBM presented by Calzavara et. al. in their research paper Verifiable Boosted Tree Ensembles accepted at the 46th IEEE Symposium on Security and Privacy 2025 (IEEE S&P 2025).

Artifact organization

The artifact is organized in the following folders:

the datasets/ folder contains the datasets used to train the large-spread boosted ensembles. See the "Obtain the datasets" section for more details about its subfolders.
the src/ folder contains:
- the training/ folder that contains all the scripts useful for training the models. It contains also a reference to the modified version of LightGBM that supports the enforcing of the large-spread condition.
- the verification/ folder that contains all the scripts useful to verify the robustness of the models. It contains the code of the following verifiers:
  - CARVE-GBM in the carvegbm/ folder.
  - A reference to our modified version of SILVA in the silva/ folder.

Download the repo

Download the repo using git clone <repo_link> --recursive to download also the submodules.

System configuration

In the paper you may find some details of the system in which we run the experiments. Here we report some details about the software. Our system uses:

python (3.8)
some python modules: scikit-learn (1.0.2), numpy (1.22.3), argparse (1.1), pandas (1.4.2), matplotlib (3.5.1), lightgbm (4.1.0)
g++ (9.4.0)
make (4.2.1)

You can use docker to run a container running Ubuntu and with all the dependensies installed. Use the script start_docker_container.sh in the main folder to build and run the docker. It requires to have installed docker.

Obtain the datasets

You can produce the training sets and test sets used in our experimental evaluation by executing the bash script download_and_split_datasets.sh in the src/datasets_utils folder.

If you want to use another dataset, you have to create the folder datasets/<dataset_name>/ and the following folders in it:

dataset/, that will contain the training_set, test_set e validation_set.
models/, models/gbdt/ and models/gbdt_lse/, that will contain the trained GDBT models and large-spread boosted ensembles.

The datasets in the datasets/<dataset_name>/dataset/ must be named as follows:

training_set_normalized for the training set;
test_set_normalized for the test set;
validation_set_normalized for the validation set obtained by dividing the entire training set in the (sub)-training set and validation set.

Compile the tools

Training - LightGBM

See the README.md in the src/lightgbm folder.

Verification - CARVE-GBM

See the README.md in the src/carvegbm folder.

Verification - SILVA

See the README.md in the src/silva folder.

Use the tools

Training - LightGBM

See the README.md in the src/lightgbm folder.

Example: ./lightgbm/lightgbm config=train.conf boosting=gbdt data=../../datasets/mnist26/dataset/training_set_normalized.csv valid=../../datasets/mnist26/dataset/validation_set_normalized.csv num_trees=500 num_leaves=16 k=0.01 seed=0 output_model=../../datasets/mnist26/models/gbdt_lse/lightgbm_lse_best_0_16_inf_0.01_subflsc_-1.txt p=inf learning_rate=0.1 early_stopping_round=50 feature_fraction=1 verbose=-1 (run it in the src/training/lightgbm/build folder).

Verification - CARVE-GBM

See the README.md in the src/carve folder.

Example: ./verify -i ../../../datasets/mnist26/models/gbdt_lse/<model_name> -t ../../../datasets/mnist26/dataset/test_set_normalized.csv -p inf -k 0.01 -ioi -1 (run it in the src/verification/carvegbm/build folder).

Verification - SILVA

See the README.md in the src/silva folder.

Example: ./silva/src/silva ../datasets/mnist26/models/gbdt/<model_name> ../datasets/mnist26/dataset/test_set_normalized.csv --perturbation l_inf 0.01 --index-of-instance -1 --voting softargmax (run it in the src/verification/silva folder).

Basic test

After compiling all the tools, you can run this simple test to check that everything works fine:

TBD

Generate experimental results

TBD

Credits

If you use this artifact in your work, please add a reference/citation to our paper. You can use the following BibTeX entry:

TBD

Support

If you want to ask questions about the artifact and notify bugs, feel free to contact us by sending an email to [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
datasets		datasets
src		src
.gitmodules		.gitmodules
Dockerfile		Dockerfile
README.md		README.md
__init__.py		__init__.py
requirements.txt		requirements.txt
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Verifiable Boosted Tree Ensembles

Artifact organization

Download the repo

System configuration

Obtain the datasets

Compile the tools

Training - LightGBM

Verification - CARVE-GBM

Verification - SILVA

Use the tools

Training - LightGBM

Verification - CARVE-GBM

Verification - SILVA

Basic test

Generate experimental results

Credits

Support

About

Releases

Packages

Languages

LorenzoCazzaro/verifiable-boosted-tree-ensembles

Folders and files

Latest commit

History

Repository files navigation

Verifiable Boosted Tree Ensembles

Artifact organization

Download the repo

System configuration

Obtain the datasets

Compile the tools

Training - LightGBM

Verification - CARVE-GBM

Verification - SILVA

Use the tools

Training - LightGBM

Verification - CARVE-GBM

Verification - SILVA

Basic test

Generate experimental results

Credits

Support

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages