Reddit Us vs. Them dataset (Populist Attitudes, Media Bias, Basic Emotions, Social Identity, UsVsThem)
This is the repository for our publication Us vs. Them: A Dataset of Populist Attitudes, News Bias and Emotions, which was recently published and presented at the EACL 2021 conference. Our Reddit dataset is made publicly available for the research community to be used. As a common courtesy the citation of our paper is very appreciated (CC BY-NC 4.0 licence applies):
Huguet-Cabot, P. L., Abadi, D., Fischer, A., & Shutova, E. (2021, April). Us vs. Them: A Dataset of Populist Attitudes, News Bias and Emotions. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume (pp. 1921–1945). http://dx.doi.org/10.18653/v1/2021.eacl-main.165
@inproceedings{huguet-cabot-etal-2021-us,
title = "Us vs. Them: A Dataset of Populist Attitudes, News Bias and Emotions",
author = "Huguet-Cabot, Pere-Llu{\'\i}s and
Abadi, David and
Fischer, Agneta and
Shutova, Ekaterina",
booktitle = "Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume",
month = apr,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "http://dx.doi.org/10.18653/v1/2021.eacl-main.165",
pages = "1921--1945"
}
The code to train and test the dataset can be found in src/.
References to local data have been removed.
Our dataset was trained and tested on an older version of pytorch lightning with some manual fixes for DDP (Distributed Data Parallel) and not provided here.
The branch three_task includes the code to train and test the three-task MTL model using emotions and group identification.
*** Update June 2024 *** The public version of our Reddit dataset is now available via figshare: https://figshare.com/s/a2b99428c8be6c936c63
To comply with GDPR laws we provide our Us Vs. Them dataset exclusively with the Reddit comment body and the labels.
For further information about our dataset or any original data, such as Reddit metadata or the news articles that prompted the comments, please contact us via email.
This license enables reusers to distribute, remix, adapt, and build upon the material in any medium or format for noncommercial purposes only, and only so long as attribution is given to the creator. CC BY-NC includes the following elements: BY: credit must be given to the creator. NC: Only noncommercial uses of the work are permitted.
https://creativecommons.org/licenses/by-nc/4.0/
This research was funded by the Horizon 2020 project Democratic Efficacy and the Varieties of Populism in Europe (DEMOS) under H2020-EU.3.6.1.1. and H2020-EU.3.6.1.2. (grant agreement ID: 822590) and supported by the European Union’s H2020 Marie Skłodowska-Curie project Knowledge Graphs at Scale (KnowGraphs) under H2020-EU.1.3.1. (grant agreement ID: 860801).