This repository contains the code and data to demonstrate the Experiments and Reproduce the results of the Privacy Enhancing Technologies Symposium (PETS) 2020 paper:
Tik-Tok: The Utility of Packet Timing in Website Fingerprinting Attacks (Read the Paper)
@article{rahman2020tik,
title={{Tik-Tok}: The utility of packet timing in website fingerprinting attacks},
author={Rahman, Mohammad Saidur and Sirinam, Payap and Mathews, Nate and Gangadhara, Kantha Girish and Wright, Matthew},
journal={Proceedings on Privacy Enhancing Technologies},
volume={2020},
number={3},
pages={5--24},
year={2020},
publisher={Sciendo}
}
In this paper, we use five datasets for our experiments. Among those, four datasets are from previous research, and we have collected the Walkie-Talkie (Real) dataset. We list the datasets as follows with appropriate description and references:
- Undefended [1]: Undefended dataset contains both closed-world (CW) & open-world (OW) data, and collected in 2016. CW data contains 95 sites with 1000 instances each and OW data contain 40,716 sites with 1 instance each.
- WTF-PAD [1]: WTF-PAD dataset contains both closed-world (CW) & open-world (OW) data, and collected in 2016 as well. CW data contains 95 sites with 1000 instances each and OW data contain 40,716 sites with 1 instance each.
- Walkie-Talkie (Simulated) [1]: Walkie-Talkie (Simulated) dataset contains only closed-world (CW) data, and collected in 2016 as well. This dataset contains 100 sites with 900 instances each.
- Onion Sites [2]: Onion Sites dataset contains only closed-world (CW) data, and collected in 2016 as well. This dataset contains 538 sites with 77 instances each.
- Walkie-Talkie (Real): Walkie-Talkie (Real) dataset contains 100 sites with over 750 instances each.
We collected this dataset using our implemented Walkie-Talkie prototype in 2019.
See the
W-T_Experiments
subdirectory for additional details.
[1] Payap Sirinam, Mohsen Imani, Marc Juarez, and Matthew Wright. 2018.
Deep Fingerprinting: Undermining Website Fingerprinting Defenses
with Deep Learning. In Proceedings of the 2018 ACM Conference on
Computer and Communications Security (CCS). ACM.
[2] Rebekah Overdorf, Mark Juarez, Gunes Acar, Rachel Greenstadt, and Claudia
Diaz. 2017. How Unique is Your. onion?: An Analysis of the Fingerprintability
of Tor Onion Services. In Proceedings of the 2017 ACM Conference on Computer
and Communications Security (CCS). ACM.
We have experiments with four types of data representations. We explain each of the data representation as follows:
-
Timing Features: Timing features consist of 160 feature values (20 feature values from 8 feature categories). In the model, timing features are represented as an 1-D array of [1x160].
-
Direction (D): We represent the direction information of an instance as a sequence of +1 and -1, +1 representing an outgoing packet and -1 representing an incoming packet. The sequences are trimmed or padded with 0’s as needed to reach a fixed length of 5,000 packets. Thus, the input forms an 1-D array of [1 x 5000].
-
Raw Timing (RT): We represent the raw timing information as a sequence of raw timestamps of an instance. The sequences are trimmed or padded with 0’s as needed to reach a fixed length of 5,000 packets. Thus, the input forms an 1-D array of [1 x 5000].
-
Directional Timing (DT): We represent the directional timing information as a sequence of the multiplication of raw timestamps and the corresponding packet direction (+1 (outgoing) or -1 (incoming)) of a particular packet of an instance. The sequences are trimmed or padded with 0’s as needed to reach a fixed length of 5,000 packets. Thus, the input forms an 1-D array of [1 x 5000].
Please make sure you have all the dependencies available and installed before running the models.
- NVIDIA GPU should be installed in the machine, running on CPU will significantly increase time complexity.
- Ubuntu 16.04.5
- Python3-venv
- Keras version: 2.3.0
- TensorFlow version: 1.14.0
- CUDA Version: 10.2
- CuDNN Version: 7
- Python Version: 3.6.x
Please install the required packages using:
pip3 install -r requirements.txt
We explain the ways to reproduce each of experimental results one by one as the following:
-
Traditional machine-learning (ML) classifier: For the experiments with k-NN [3], SVM (CUMUL) [4], and k-FP [5], we refer to the classifier from the respective repositories.
[3] Tao Wang, Xiang Cai, Rishab Nithyanand, Rob Johnson, and Ian Goldberg. 2014. Effective attacks and provable defenses for website fingerprinting. In Proceedings of the 23rd USENIX Conference on Security Symposium. [4] Andriy Panchenko, Fabian Lanze, Jan Pennekamp, Thomas Engel, Andreas Zinnen, Martin Henze, and Klaus Wehrle. 2016. Website fingerprinting at Internet scale. In Proceedings of the 23rd Network and Distributed System Security Symposium (NDSS). [5] Jamie Hayes and George Danezis. 2016. k-Fingerprinting: A robust scalable website fingerprinting technique. In Proceedings of the 25th USENIX Conference on Security Symposium.
-
Timing Features in Deep Fingerprinting [1] model:
You can either:
i) process raw data to get the features (Zenodo Cloud Data Repository)
- Raw Datasets files:
Undefended.zip
,Undefended_OW.zip
,WTF_PAD.zip
,WTF_PAD_OW.zip
,W_T_Simulated.zip
,Onion-Sites.zip
or, ii)use our processed data given in this (Zenodo Cloud Data Repository)
- Processed Datasets files:
processed_timing_features.zip
,Processed_Congestion_Analysis_Fast_Circuits.7z
,Processed_Congestion_Analysis_Slow_Circuits.7z
- Raw Datasets files:
[Update June, 2024] We have moved the datsets from Google drive to Zenodo Cloud Data Repository
If you are using our processed data,
please download the processed data and put them into the Timing_Features/save_data/
directory.
Please go to Timing_Features
directory and run the following command.
In the place of dataset, please write any of the following: Undefended, WTF-PAD, W-T-Simulated, Onion-Sites
python Tik_Tok_timing_features.py dataset
Optional: We have also added a jupyter notebook (Tik_Tok_timing_features.ipynb) for a better interactive environment.
A snippet of output for Undefended data:
python Tik_Tok_timing_features.py Undefended
Using TensorFlow backend.
76000 train samples
9500 validation samples
9500 test samples
Train on 76000 samples, validate on 9500 samples
Epoch 1/100
- 11s - loss: 4.1017 - acc: 0.0593 - val_loss: 2.9626 - val_acc: 0.1926
Epoch 2/100
- 7s - loss: 2.9497 - acc: 0.1976 - val_loss: 2.4673 - val_acc: 0.3026
.....
Epoch 99/100
- 7s - loss: 0.3103 - acc: 0.9109 - val_loss: 0.7414 - val_acc: 0.8216
Epoch 100/100
- 7s - loss: 0.3096 - acc: 0.9104 - val_loss: 0.7639 - val_acc: 0.8239
Testing accuracy: 0.843284285
```
See the DL_Experiments
directory for the scripts used to perform the Direction, Raw Timing, and Directional Timing experiments.
Our W-T crawling software and instructions can be downloaded as a zip file from the following link: gdrive
The scripts used to evaluate the dataset and related instructions are found in the W-T_Experiments
subdirectory.
For information leakage analysis, we refer to our re-implemented github repository of WeFDE: https://github.com/notem/reWeFDE.
See the Congestion_Analysis
directory for the scripts used to perform the experiments with the instances of slow circuits as test set
and instances of fast circuits as test set
.
We processed the data to feed into model. Please create a sub-directory named datasets
inside the
Congestion_Analysis
directory. Download the data from this google drive url.
Extract the downloaded files to datasets
sub-directory.
Parameters:
--congestion
: choices = ['slow', 'fast']
slow: Instances of Slow cirtuits as test set.
fast: Instances of fast circuits as test set.)--dataset
: choices=['Undefended', 'WTF-PAD', 'Onion-Sites']--data_rep
: choices = ['D', 'RT', 'DT']
Type of data representation to be used.
D: direction, RT: Raw Timing, and DT: Directional Timing
Example of Usage:
python Tik_Tok_Congestion.py --congestion slow --dataset Undefended --data_rep D
Please, address any questions, comments, or feedback to the authors of the paper. The main developers of this code are:
- Mohammad Saidur Rahman ([email protected])
- Nate Mathews ([email protected])
- Payap Sirinam ([email protected])
- Kantha Girish Gangadhara ([email protected])
- Matthew Wright ([email protected])
We thank the anonymous reviewers for their helpful feedback. We give special thanks to Tao Wang for providing details about the technical implementation of the W-T defense, and to Marc Juarez for providing guidelines on developing the W-T prototype. This material is based upon work supported in part by the National Science Foundation (NSF) under Grants No. 1722743 and 1816851.