This repository provides the dataset published in the paper "Taxonomy of Mathematical Plagiarism" and experimented's code.
We curated a dataset of potentially plagiarised document math content span pairs along with Obfuscation (the way in which content is modified) types. The dataset and information on the accompanying files are available in data/
We analyzed the best-performing approaches to detect plagiarism and mathematical content similarity on the newly established taxonomy. Corresponding code is present in code/experiments/.
CC-BY-SA 4.0. This defines the license for the whole dataset, which contains non-copyrighted bibliographic metadata and reference data derived from I4OSC (CC0).