We provide a large-scale dataset built by our toolchains for x86/x64, arm32/mthumb/aarch64 and mips32/mips64. To reproduce the result in our Usenix paper easily, we also provide corresponding scripts.
- x86 dataset. Decompressed size is ~56GB.
- mips arm dataset. Move the
testsuite.tar
to table_7 directory. Decompressed size is ~35GB.
There are two ways to setting up the environments: 1) in your machine(we have tested in Ubuntu 20.04) or 2) Docker
apt-get update && \
apt-get install -y \
build-essential \
cmake \
git \
sqlite3 \
wget \
bison \
zlib1g-dev \
python3 \
python3-pip
pip3 install protobuf \
capstone \
pyelftools \
sqlalchemy
docker pull bin2415/py_gt
Attention: the decompressed space is ~130GB. Steps to reproduce results in Table 2:
We prepared a vm image of Virtualbox. The trained model is in the vm. The steps to reproduce is:
-
Download the ByteWeight image: image.
-
Import the image in Virtualbox.
# The username and password of the vm are: byteweight:password
# mount the disk
sudo mount /dev/sdb /work
- Test the model of ByteWeight:
cd ~/ByteWeight/code/script
## collect the Recall and Precision of Symbol information(Row 2 and Row 4 in the table)
bash run_compare_symbol.sh && bash run_symbol.sh
## collect the Recall and Precision of Oracle(Row 3 and Row 5 in the table)
bash run_compare_oracle.sh && bash run_oracle.sh
Here we use coreutils
of our x86/x64 dataset to train the ByteWeight model as the example.
## bash byteweight/preprocess.sh <path of coreutils> <dst path of preprocessed dataset>
bash byteweight/preprocess.sh ./x86_dataset/linux/utils/coreutils ./result/byteweight/dataset
Two folders(oracle and symbol) are generated in result/byteweight/dataset
Copy the two folders into virtual machine and run following instructions.
$ cd ~/ByteWeight/code/script
# train oracle dataset
$ bash experiment_lin.sh <path of dataset/oracle>
# train symbol dataset
$ bash experiment_lin.sh <path of dataset/symbol>
Overwrite the dataset/symbol/binary
with dataset/oracle/binary
at first.
cp dataset/oracle/binary/* dataset/symbol/binary/*
And refer to the ~/ByteWeight/code/script/run_compare_oracle.sh
test the trained model.
Steps to reproduce results in Usenix paper Table 3:
git clone https://github.com/CUMLSec/XDA XDA_repo
# install conda
wget -c https://repo.anaconda.com/archive/Anaconda3-2022.05-Linux-x86_64.sh
bash ./Anaconda3-2022.05-Linux-x86_64.sh
# environment setup
conda create -n xda python=3.7 numpy scipy scikit-learn colorama
conda activate xda
conda install pytorch torchvision torchaudio cudatoolkit=11.0 -c pytorch
cd XDA_repo && pip install --editable .
We prepared our finturned models here. Decompress and put them into XDA_repo/checkpoints/
. Download data-bin
here and put them into XDA_repo/data-bin
.
cp xda/eval_pair_inst_bound.py XDA_repo/scripts/play
cp xda/*.sh XDA_repo
cp -r xda/XDAInstTest/ XDA_repo
cd XDA_repo
bash run_oracle.sh
bash run_elfmap.sh
# get the result of `Oracle` line of table 3
bash calculate_oracle.sh
# get the result of `Debug` line of table 3
bash calculate_elfmap.sh
If you want to train the a new model, we also provide following instructions.
Here we use coreutils
of our x86/x64 dataset to train the XDA model as the example.
## bash xda/runOracleGTXDA.sh -d <dataset> -o <the path to save the preprocessed data> -s xda/OracleGTXDA.py -p <prefix of ground truth>
## here, we use the `coreutils` as example
## convert ground truth of `Oracle`
$ bash xda/runOracleGTXDA.sh -d ./x86_dataset/linux/utils/coreutils -s xda/OracleGTXDA.py -p "gtBlock" -o ./result/xda/preprocess/oracleGT
## convert ground truth of `elfMap`
$ bash xda/runOracleGTXDA.sh -d ./x86_dataset/linux/utils/coreutils -s xda/OracleGTXDA.py -p "elfMapInst" -o ./result/xda/preprocess/elfmapGT
Select the dataset of training randomly.
ls ./result/xda/preprocess/oracleGT > /tmp/dataset.list
# select 9/10 of dataset to train
shuf -n 567 /tmp/dataset.list > result/xda/train64.list
Collect the training dataset and validing dataset into files like them.
## training dataset of Oracle ground truth
## python3 xda/preprocess.py -t <path to train64.list> -i <folder of preprocessed data> -o <path to data-src inside xda_repo>
$ mkdir -p XDA_repo/data-src/instbound_oracle
$ python3 xda/preprocess.py -t result/xda/train64.list -i ./result/xda/preprocess/oracleGT -o XDA_repo/data-src/instbound_oracle/train
## training dataset of elfmap ground truth
$ mkdir -p XDA_repo/data-src/instbound_elfmap
$ python3 xda/preprocess.py -t result/xda/train64.list -i ./result/xda/preprocess/elfmapGT -o XDA_repo/data-src/instbound_elfmap/train
## validing dataset of Oracle ground truth
$ python3 xda/preprocess.py -t result/xda/train64.list -i ./result/xda/preprocess/oracleGT -o XDA_repo/data-src/instbound_oracle/valid -v
## validing dataset of elfmap ground truth
$ python3 xda/preprocess.py -t result/xda/train64.list -i ./result/xda/preprocess/oracleGT -o XDA_repo/data-src/instbound_elfmap/valid -v
Preprocess the data:
$ conda activate xda
$ cp xda/preprocess_scripts/* xda_repo/scripts/finetune
$ cd xda_repo
$ bash scripts/finetune/preprocess_oracle.sh
$ bash scripts/finetune/preprocess_elfmap.sh
As we only train the finetuned models, we reused the pretrained model of XDA. Please download the pretrained model here and put it in XDA_repo/checkpoints/pretrain_all/
.
# train the model of oracle ground truth
bash scripts/finetune/finetune_inst_oracle.sh
# train the model of elfmap ground truth
bash scripts/finetune/finetune_inst_elfmap.sh
Please refer this step
Steps to reproduce results in Usenix paper Table 4:
The result of Manual
in Table 4 is direct from paper Binary Code is Not Easy Table 3.
The result of Oracle
in Tail Call
column 5(Column 5, Row 3 and Column 5, Row 5) is direct paper sok Table XIII.
To reproduce the result of Oracle
in Embeded
(Column 3, Row 3 and Column 3, Row 5):
# compare dyninst with oracle on embeded instructions
bash ../script/run_comp_inst_openssl.sh -d x86_dataset/openssl-1.1.0l_execs -s ../compare/compareInstsX86.py -p "BlockDyninst932" -o result/insts_openssl/dyninst -g "gtBlock"
pushd $PWD && cd result/insts_openssl/dyninst
# collect results
grep "Recall" -r . | awk '{sum += $2; cnt += 1} END {print sum/cnt}'
grep "Precision" -r . | awk '{sum += $2; cnt += 1} END {print sum/cnt}'
popd
As we have Precision
and Recall
, F1-Score is calculated by 2*Precision*Recall/(Precision + Recall)
To reproduce the result of Oracle
in JMPTBL
(Column 4, Row 3 and Column 4 and Row 5):
# compare dyninst with oracle on jump tables
bash ../script/run_comp_inst_no_O1.sh -d x86_dataset/linux -s ../compare/compareJmpTableX86.py -p "BlockDyninst932" -o result/jmptbl/dyninst
pushd $PWD && cd result/jmptbl/dyninst
grep "Recall" -r . | awk '{sum += $3; cnt += 1} END {print sum/cnt}'
grep "Precision" -r . | awk '{sum += $3; cnt += 1} END {print sum/cnt}'
popd
As we have Precision
and Recall
, F1-Score is calcualted by 2*Precision*Recall/(Precision + Recall)
Steps to reproduce results in Usenix paper Table 5:
# compare zafl with oracleGT
bash ../script/run_comp_inst_zafl.sh -d x86_dataset/linux -s ../compare/compareInstsX86.py -p "ridaInst" -g "gtBlock" -o result/insns/oracle/zafl
# compare zafl with objdump with sym; This could run parallelly
bash ../script/run_comp_inst_zafl.sh -d x86_dataset/linux -s ../compare/compareInstsX86.py -p "ridaInst" -g "objdumpBB" -o result/insns/objdump_syms/zafl
# compare zafl with objdump without sym; This could run parallelly
bash ../script/run_comp_inst_zafl_strip.sh -d x86_dataset/linux -s ../compare/compareInstsX86.py -p "ridaInst" -o result/insns/objdump_no_syms/zafl
# check the results.
bash collect_table_5.sh
Steps to reproduce results in Usenix paper Figure 2:
bash ../script/run_comp_inst_ida.sh -d x86_dataset/linux -s ../compare/compareJmpTableX86.py -p "BlockIda" -o result/jmptbl/ida
grep "Precision" -r result/jmptbl/ida | grep -v O1 > /tmp/ida_precision.log
python3 write_csv.py /tmp/ida_precision.log ./ida_pre.log "Precision"
# need to install pandas,matplotlib and seaborn: pip3 install pandas matplotlib seaborn
# the result is stored into ./ida_pre_distribution.pdf
python3 plot_violin_seaborn_jmptbl_ida.py ./ida_pre.log ./ida_pre_distribution.pdf
Steps to reproduce results in Usenix paper Figure 3:
# scripts to compare populer disassemblers with oracle
bash run_figure_3.sh
bash collect_figure_3.sh
Steps to reproduce results in Usenix paper Figure 5:
# scripts to compare popular disassemblers with oracle
bash run_figure_5.sh
bash collect_figure_5.sh
Steps to reproduce results in Usenix paper Table 7:
cd table_7
tar -xvf testsuites.tar.gz
bash run_table_7.sh
And the compare result will be saved in table_7/res.log
Steps to reproduce results in Usenix paper Table 6:
# compare compilation metadata with oracle
bash ../script/run_comp_elfmap_gt.sh -d ./x86_dataset/linux/ -s ../compare/compareInstsX86.py -p "elfMapInst" -o result/insns/elfmap
bash collect_table_6.sh
We put the results of binary disassemblers and ground truth into dataset. So this step is optional and it takes several days to process.
We built popular open-source disassemblers(radare2, angr, bap, ghidra, dyninst, and objdump) in docker image bin2415/py_gt
.
$ docker pull bin2415/py_gt
$ docker run -it --privileged --name py_gt -v $PWD/x86-sok:/opt bin2415/py_gt /bin/bash
## prepare striped binaries
/opt/artifact_eval$ bash ../script/extract_strip.sh ./x86_dataset
/opt/artifact_eval$ bash ../script/extract_strip.sh ./table_7/testsuite
The steps of extracting results of radare2:
# x86/x64 dataset
/opt/artifact_eval$ bash ../script/run_disassembler.sh -d ./x86_dataset/linux/ -s ../disassemblers/radare/radareBB.py -p "BlockRadare"
# arm/mips dataset
/opt/artifact_eval$ bash ../script/run_disassembler.sh -d ./table_7/testsuite/ -s ../disassemblers/radare/radareBB.py -p "BlockRadare"
The steps of extracting results of angr:
/opt/artifact_eval$ conda activate angr
# x86/x64 dataset
/opt/artifact_eval$ bash ../script/run_disassembler.sh -d ./x86_dataset/linux/ -s ../disassemblers/angr/angrBlocks.py -p "BlockAngr"
# x86/x64 dataset
/opt/artifact_eval$ bash ../script/run_disassembler.sh -d ./table_7/testsuite/ -s ../disassemblers/angr/angrBlocks.py -p "BlockAngr"
conda deactivate
The steps of extracting results of bap:
/opt/artifact_eval$ opam switch 4.12.1
/opt/artifact_eval$ eval $(opam env)
# x86/x64 dataset
/opt/artifact_eval$ bash ../script/run_disassembler.sh -d ./x86_dataset/linux/ -s ../disassemblers/bap/bapBB.py -p "BlockBap"
# arm/mips dataset
/opt/artifact_eval$ bash ../script/run_disassembler.sh -d ./table_7/testsuite/ -s ../disassemblers/bap/bapBB.py -p "BlockBap"
The steps of extracting results of ghidra:
# x86/x64 dataset
/opt/artifact_eval$ bash ../script/run_disassembler_ghidra.sh -d ./x86_dataset/linux/ -p "BlockGhidra"
# arm/mips dataset
/opt/artifact_eval$ bash ../script/run_disassembler_ghidra.sh -d ./table_7/testsuite/ -p "BlockGhidra"
The steps of extracting results of dyninst:
/opt/artifact_eval$ pushd $PWD && cd ../disassemblers/dyninst && make && popd
# x86/x64 dataset
/opt/artifact_eval$ bash ../script/run_disassembler_dyninst.sh -d ./x86_dataset/linux/ -s ../disassemblers/dyninst/dyninstBlocks -p "BlockDyninst932"
# arm/mips dataset
/opt/artifact_eval$ bash ../script/run_disassembler_dyninst.sh -d ./table_7/testsuite/ -s ../disassemblers/dyninst/dyninstBlocks -p "BlockDyninst932"
The steps of extracting results of objdump:
# x86/x64
/opt/artifact_eval$ bash ../script/run_disassembler.sh -d ./x86_dataset/linux/ -s ../disassemblers/objdump/objdumpBB.py -p "BlockObjdump"
# arm/mips
/opt/artifact_eval$ bash ../script/run_disassembler.sh -d ./table_7/testsuite/ -s ../disassemblers/objdump/objdumpBB.py -p "BlockObjdump"
Ida pro is a commecial software, so the binary is not available. The steps of extracting results of ida:
# x86/x64
/opt/artifact_eval$ bash ../script/run_disassembler.sh -d ./x86_dataset/linux/ -s ../disassemblers/ida/runIDAScript.py -p "BlockIda"
# arm/mips
/opt/artifact_eval$ bash ../script/run_disassembler.sh -d ./table_7/testsuite/ -s ../disassemblers/ida/runIDAScript.py -p "BlockIda"
BinaryNinja is a commecial software, so the binary is not available. The steps of extracting results of binaryninja:
# x86/x64
/opt/artifact_eval$ bash ../script/run_disassembler.sh -d ./x86_dataset/linux/ -s ../disassemblers/ninja/ninjaBB.py -p "BlockIda"
# arm/mips
/opt/artifact_eval$ bash ../script/run_disassembler.sh -d ./table_7/testsuite/ -s ../disassemblers/ninja/ninjaBB.py -p "BlockIda"
# x86/x64
/opt/artifact_eval$ bash ../script/run_extract_linux.sh -d ./x86_dataset/linux/ -s ../extract_gt/extractBB.py -p "gtBlock"
# arm/mips
/opt/artifact_eval$ bash ../script/run_extract_linux.sh -d ./table_7/testsuite -s ../extract_gt/extractBB.py -p "gtBlock"
In this section, we give an example that explains how to use our toolchains to build new testsuite with our toolchains and compare disassembler with our ground truth. We provide five docker images for x86-x64, arm32, aarch64, mipsle32, and mipsle64.
At the begining, we have some dependencies of deploying the toolchiains:
# install docker firstly
$ curl -fsSL https://get.docker.com/ | sudo sh
$ sudo usermod -aG docker [user_id]
# install qemu
sudo apt-get install qemu binfmt-support qemu-user-static
# run the multiarch/qemu-user-static container to register:
sudo docker run --rm --privileged multiarch/qemu-user-static --reset -p yes
Here is the example that use our arm32 toolchain to cross compile coreutils:
## pull the docker image
$ docker pull z472421519/arm32_gt
$ docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
z472421519/arm32_gt latest beb7bba8d960 7 hours ago 5.11GB
$ docker run --rm -it -v $PWD:/opt/shared z472421519/arm32_gt /bin/bash
# it is fine for the following warning message
WARNING: The requested image's platform (linux/arm/v7) does not match the detected host platform (linux/amd64) and no specific platform was requested
## insider docker
## configure CC, CXX, CFLAGS and CXXFLAGS
root@32380fe55c2a:/gt_arm32# source gcc32_arm.rc
root@32380fe55c2a:/gt_arm32# export CFLAGS="-O2 $CFLAGS" && export CXXFLAGS="-O2 $CXXFLAGS"
root@32380fe55c2a:/gt_arm32# cd /opt/shared
## download coreutils
root@32380fe55c2a:/opt/shared# wget -c https://mirror.powerfly.ca/gnu/coreutils/coreutils-8.30.tar.xz && tar -xvf coreutils-8.30.tar.xz && cd coreutils-8.30
root@32380fe55c2a:/opt/shared/coreutils-8.30# mkdir build_gcc_O2 && cd build_gcc_O2
root@32380fe55c2a:/opt/shared/coreutils-8.30/build_gcc_O2# export FORCE_UNSAFE_CONFIGURE=1 && ../configure --prefix=$PWD
root@32380fe55c2a:/opt/shared/coreutils-8.30/build_gcc_O2# make -j && make install
## The compiled binary is installed in /opt/shared/coreutils-8.30/build_gcc_O2/bin
Start a py_gt
docker container, suppose we are in the artifact_eval
folder.
docker run -it --rm -v $PWD/../:/opt/shared bin2415/py_gt /bin/bash
## insider py_gt docker. suppose we are currently in `/opt/shared/artifact_eval` directory, and the built binaries is in `./coreutils-8.30/build_gcc_O2/bin`
## we want to extract ground truth of `./coreutils-8.30/build_gcc_O2/bin`
/opt/shared/artifact_eval: bash ../script/run_extract_linux.sh -d ./coreutils-8.30/build_gcc_O2/bin/ls -s ../extract_gt/extractBB.py
## the extracted ground truth is `./coreutils-8.30/build_gcc_O2/bin/gtBlock_ls.pb`
Here, we compare radare with our ground truth. The steps of extracting disassembly result of radare2 are shown as following:
# inside py_gt docker:
# strip targeted binary: ls
cp ./coreutils-8.30/build_gcc_O2/bin/ls ./coreutils-8.30/build_gcc_O2/bin/ls.strip && strip ./coreutils-8.30/build_gcc_O2/bin/ls.strip
# extract disassembly result of radare2
python3 ../disassemblers/radare/radareBB.py -b ./coreutils-8.30/build_gcc_O2/bin/ls.strip -o ./coreutils-8.30/build_gcc_O2/bin/BlockRadare_ls.pb
python3 ../compare/compareInstsArmMips.py -b ./coreutils-8.30/build_gcc_O2/bin/ls -c ./coreutils-8.30/build_gcc_O2/bin/BlockRadare_ls.pb -g ./coreutils-8.30/build_gcc_O2/bin/gtBlock_ls.pb
# the result is shown:
...
[Result]:The total instruction number is 17558
[Result]:Instruction false positive number is 278, rate is 0.015833
[Result]:Instruction false negative number is 3630, rate is 0.206743
[Result]:Padding byte instructions number is 0, rate is 0.000000
[Result]:Precision 0.980431
[Result]:Recall 0.793257
python3 ../compare/compareFuncsArmMips.py -b ./coreutils-8.30/build_gcc_O2/bin/ls -c ./coreutils-8.30/build_gcc_O2/bin/BlockRadare_ls.pb -g ./coreutils-8.30/build_gcc_O2/bin/gtBlock_ls.pb
# the result is shown:
...
[Result]:The total Functions in ground truth is 288
[Result]:The total Functions in compared is 195
[Result]:False positive number is 23
[Result]:False negative number is 116
[Result]:Precision 0.882051
[Result]:Recall 0.597222
python3 ../compare/compareJmpTableArmMips.py -b ./coreutils-8.30/build_gcc_O2/bin/ls -c ./coreutils-8.30/build_gcc_O2/bin/BlockRadare_ls.pb -g ./coreutils-8.30/build_gcc_O2/bin/gtBlock_ls.pb
# the result is shown:
...
[Result]:The total jump table in ground truth is 18
[Result]:The total jump table in compared is 24
[Result]:False negative number is 0
[Result]:False positive number is 6
[Result]:Wrong successors number is 17
[Result]: Recall: 0.055556
[Result]: Precision: 0.041667