Web Usage Study 0.8.0

Summary

The goal of this data collection tool is to archive a user's behavior and the digital traces of their web experiences. We use a two point approach, involving the collection of ecological and controlled data. Ecological data includes a real user's behavior, their web account histories, and in some cases, real-time snapshots of the websites they visit. Controlled data includes snapshots of websites that we take from a user's computer, including select keyword searches on Google Search or YouTube.

This extension was used to collect data for the following studies:

Robertson, R. E., Green, J., Ruck, D. J., Ognyanova, K., Wilson, C., & Lazer, D. (2023). Users choose to engage with more partisan news than they are exposed to on Google Search. Nature, 618, 342–348. DOI: 10.1038/s41586-023-06078-5
Chen, A. Y., Nyhan, B., Reifler, J., Robertson, R. E., & Wilson, C. (2023). Subscriptions and external links help drive resentful users to alternative and extremist YouTube channels. Science Advances, 9(35). DOI: 10.1126/sciadv.add8080
Gleason, J., Hu, D., Robertson, R. E., & Wilson, C. (2023). Google the gatekeeper: How search components affect clicks and attention. Proceedings of the International AAAI Conference on Web and Social Media (ICSWM 2023), 17, 245–256. DOI: 10.1609/icwsm.v17i1.22142

Data Collection Capacities

We used a browser extension in order to collect four types of data about users’ web browsing.

Ecological Data

Where people go (monitor API - passive)
Collection: tracking the meta data of users’ path through the web, including the URLs they visit and when they visited them.
Scope: absolute – we record everything (i.e. independent browser history).
What people saw (snapshot API - passive)
Collection: saving a copy of the HTML that was rendered when a user visited a URL.
Scope: filtered to trigger on a set of pre-selected web domains, including, Google Search, YouTube, Facebook Newsfeed, Twitter Feed.

Controlled Data

What people would have seen (snapshot API - active)
Collection: visiting URLs from users’ computers and saving a copy of the HTML that renders.
Scope: limited to pre-selected URLs (e.g. the YouTube homepage or a fixed Google Search) or dynamically discovered URLs (e.g. recursive algorithm interrogation). Can collect in a normal and a private window to measure personalization.
What websites know about people (history API - active)
Collection: automating the collection of data from various web services and accounts, including their browsing history, Google account, and ad preferences.
Scope: limited to pre-selected accounts and services, and by the kinds of data those services provide access to.

Technical Details

Code Overview

The browser extension is written in JavaScript, HTML, CSS, and the WebExtensions API, which is compatible with Firefox and Chrome. It consists of APIs, Web Workers, and a system for communicating between them.

Documentation

Throughout the project, we use jsdoc to document the browser extension code. To rebuild the documentation, install node and jsdocs, then run: bash ./scripts/make_docs.sh

Server

We used a Flask app built to receive and store data sent from extension in a MySQL database. Data storage format is specified in application/models.py and matches the format above.

To start the server for local testing:

Create a virtual environment (python3.6) and install requirements.txt
Open ./scripts/start_plaform.sh and change FLASK_ENV to your virtualenv path
Run bash ./scripts/start_platform.sh
Flask will begin running on http://0.0.0.0:80/ and logging to the console
Data targeted at the server (e.g. http://x.y.z.a:80/save_data) will be saved to a SQL database specified in ./sqlplatform/config.py

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
extension		extension
scripts		scripts
sqlplatform		sqlplatform
LICENSE		LICENSE
README.md		README.md
conf.json		conf.json
meta-extension-chrome.zip		meta-extension-chrome.zip
meta-extension-fx.zip		meta-extension-fx.zip
meta-extension.zip		meta-extension.zip
package-lock.json		package-lock.json
package.json		package.json
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Usage Study 0.8.0

Summary

Data Collection Capacities

Ecological Data

Controlled Data

Technical Details

Code Overview

Documentation

Server

About

Releases

Packages

Languages

License

gitronald/webusage

Folders and files

Latest commit

History

Repository files navigation

Web Usage Study 0.8.0

Summary

Data Collection Capacities

Ecological Data

Controlled Data

Technical Details

Code Overview

Documentation

Server

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages