Authorship Obfuscation in Multilingual Machine-Generated Text Detection

Source code for replication of the experiments in the paper accepted to EMNLP 2024 Findings.

Cite

If you use the data, code, or the information in this repository, cite the following paper, please (also available on arXiv).

@misc{macko2024authorshipobfuscationmultilingualmachinegenerated,
      title={Authorship Obfuscation in Multilingual Machine-Generated Text Detection}, 
      author={Dominik Macko and Robert Moro and Adaku Uchendu and Ivan Srba and Jason Samuel Lucas and Michiharu Yamashita and Nafis Irtiza Tripto and Dongwon Lee and Jakub Simko and Maria Bielikova},
      year={2024},
      eprint={2401.07867},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2401.07867}, 
}

Install Dependencies

Each external obfuscator has each own requirements and dependencies that need to be installed from the corresponding repository. At least, install the dependencies of the original MULTITuDE benchmark, on which our study is built.

Data

Firstly, the original unobfuscated data need to be downloaded from MULTITuDE and put into the 'dataset/multitude.csv.gz' file. Afterwards, the obfuscated data can be generated by the provided scripts:

For backtranslation (to select m2m100 or nllb-200 model, just uncomment the corresponding model_name in the source code), use the provided 01_backtranslation.py.
For paraphrasing by ChatGPT, use the provided 01_chatgpt_paraphrase.py.
For paraphrasing by PEGASUS-paraphrase, use the provided 01_pegasus_paraphrase.py.
For paraphrasing by DIPPER, use the source code provided in the original DIPPER repository, while applying the settings mentioned in the paper.
For text edits by GPTZzzs, use the source code provided in the original GPTZzzs repository.
For text edits by GPTZero-Bypasser, use the source code provided in the original GPTZero-Bypasser repository.
For text edits by a generic HomoglyphAttack, use the provided 01_homoglyphattack.py.
For text edits by ALISON, use the source code provided in the original ALISON repository, while applying the settings mentioned in the paper.
For text edits by DFTFooler, use the source code provided in the original DFTFooler repository, while applying the settings mentioned in the paper.
After the obfuscated data are generated, run the provided 02_text_quality.ipynb Google Colab notebook to calculate and analyze various automated text similarity metrics between the original and obfuscated data.

Fine-tuning

To fine-tune a base model for machine-generated text detection task, use the provided 03_finetune_detector.py script.

Run Detection

To run statistical methods, use the IMGTB framework. To run pre-trained and fine-tuned methods, use the provided 04_test_detector.py script. Since the Longformer Detector requires a special pre-processing step, use the provided 04_test_longformer.py instead.

Results Analysis

To analyze the results, use the provided 05_results_analysis.ipynb Google Colab notebook.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Authorship Obfuscation in Multilingual Machine-Generated Text Detection

Cite

Install Dependencies

Data

Fine-tuning

Run Detection

Results Analysis

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
dataset		dataset
results/obfuscated		results/obfuscated
01_backtranslation.py		01_backtranslation.py
01_chatgpt_paraphrase.py		01_chatgpt_paraphrase.py
01_homoglyphattack.py		01_homoglyphattack.py
01_pegasus_paraphrase.py		01_pegasus_paraphrase.py
02_text_quality.ipynb		02_text_quality.ipynb
03_finetune_detector.py		03_finetune_detector.py
04_test_detector.py		04_test_detector.py
04_test_longformer.py		04_test_longformer.py
05_results_analysis.ipynb		05_results_analysis.ipynb
LICENSE		LICENSE
README.md		README.md

License

kinit-sk/mAO

Folders and files

Latest commit

History

Repository files navigation

Authorship Obfuscation in Multilingual Machine-Generated Text Detection

Cite

Install Dependencies

Data

Fine-tuning

Run Detection

Results Analysis

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages