A repository for aggregating web domain metrics, like partisanship or veracity classification, from peer reviewed publications. All data gathering and aggregating can be replicated by running bash replicate.sh
. If you're looking for the final product see: data/domains.tsv
News is classifications are available in the news_is_news
column, and are defined using:
- 488 domains identified as ‘hard news’ by Bakshy et al. (2015)
- 1,250 domains manually identified as news by Grinberg et al. (2019), and
- 6,288 domains aggregated from local news listings by Yin (2018)
Currently includes data from:
Grinberg, N., Joseph, K., Friedland, L., Swire-Thompson, B., & Lazer, D. (2019). Fake news on Twitter during the 2016 US presidential election. Science, 363(6425), 374-378. Download data
Robertson, R. E., Jiang, S., Joseph, K., Friedland, L., Lazer, D., & Wilson, C. (2018). Auditing Partisan Audience Bias within Google Search. Proceedings of the ACM on Human-Computer Interaction, 2(CSCW), 148. Download data
Leon Yin. (2018). yinleon/LocalNewsDataset: Initial release (V1.0). Zenodo. https://doi.org/10.5281/zenodo.1345145
Robertson et al. (2018) includes data from:
- AllSides. 2018. Media Bias Ratings. AllSides. (2018). Download Data
- Amy Mitchell, Jeffrey Gottfried, Jocelyn Kiley, and Katerina Eva Matsa. 2014. Political Polarization & Media Habits. Pew Research Center’s Journalism Project. (Oct. 2014). Download data
- Ceren Budak, Sharad Goel, and Justin M Rao. 2016. Fair and balanced? Quantifying media bias through crowdsourced content analysis. Public Opinion Quarterly 80, S1 (2016), 250–271. Download data
- Eytan Bakshy, Solomon Messing, and Lada A Adamic. 2015. Exposure to ideologically diverse news and opinion on Facebook. Science 348, 6239 (2015), 1130–1132. Download data