This repository contains all the scripts and files related the blogforever-crawler-publication. It is organized as follows:
/tex
contains the latex and gnuplot files together with instructions on how to compile the paper from source,/dataset
explains how to extract our test-set from of the Spinn3r Dataset,/success-rates
has the scripts we used to obtain the "extraction success rates" data,/running-times
contains the code we used for running time measurements.
Please keep in mind that the scripts you will find here were written as "single use code" and are anything but beautiful. If you have any issue compiling the paper or running the experiments just let me know!