If you are a crisis response team who needs help working with this data — please contact [email protected].
VisitData.org needs volunteer programmers, data analysts, and crisis team liaisons — please contact [email protected] or visit https://github.com/VisitData-org/ca_visit_tracking.
More FAQs here: https://visitdata.org/faq
To run the app locally, in development mode:
- Obtain a Google Maps API key. If you do not have one, you can just set it to
""
and the map will be disabled. Note the API key should never be committed to the git repository. - Set up a Python virtualenv, as specified below.
- Run the server, as specified below.
- Use Chrome or Firefox as your browser - it is reported that Safari reports a CORS error in development.
To set up a virtual env:
$ python3 -m venv ~/myenv
$ source ~/myenv/bin/activate
$ pip install -r requirements.txt
To run the server in development mode:
$ export MAPS_API_KEY="..."
$ gcloud auth application-default login
$ make run
The development server will automatically refresh when files change.
Data has moved out of the repository and into Google Cloud Storage in a public bucket. You can access the latest data at https://visitdata.org/data/
- Install the Google Cloud SDK
- Run:
gsutil ls gs://data.visitdata.org
The latest data snapshot is hosted on https://visitdata.org/data/
Historic data snapshots are also hosted on https://data.visitdata.org/
For example, you can retrieve https://data.visitdata.org/processed/vendor/foursquare/asof/20200403-v0/taxonomy.json
To import new data:
-
Copy yesterday's data
$ gsutil -m cp -r gs://data.visitdata.org/processed/vendor/foursquare/asof/20200402-v0 /tmp
-
Process the new day's data by pointing to the previous day's data, the new download file and the name of the build directory to be created by the script.
$ python3 bin/foursquare_cube.py --prevdir /tmp/build2/20200807-v20200807-v0/ /Users/david/Downloads/drive-download-20200810T211412Z-001/data-cube2-2020-08-07.tar v20200808-v0 /tmp/build3
-
Load the processed data to the bucket
$ bin/foursquare_load.sh /tmp/build2/20200807-v20200807-v0/ 20200807-v0
-
Modify
app.yaml
to point to the new data version$ vi app.yaml ... env_variables: FOURSQUARE_DATA_VERSION: "20200403-v0"
I have something that notifies me via text message when there is a new file uploaded to the google drive between 5am and 10pm. That prevents me from having to ask Nate to post in slack. Data doesn't come in reliably at the same time every day and sometimes comes in 2-3 days in a batch.
- Download the new data from the Data Cube v20201120 Google Drive.
- Move the files to my scratch directory:
ca_visit_tracking/datascratch on master on ☁️ [email protected](emailstats)
❯ pwd
/Users/andrewjanian/covid/ca_visit_tracking/datascratch
ca_visit_tracking/datascratch on master on ☁️ [email protected](emailstats)
❯ mv /Users/andrewjanian/Downloads/data-cube-2021-02-16.tar /Users/andrewjanian/Downloads/data-cube-2021-02-17.tar /Users/andrewjanian/Downloads/data-cube-2021-02-18.tar .
- Cat the tars together
ca_visit_tracking/datascratch on master on ☁️ [email protected](emailstats)
❯ rm data.tar && find . -type f -name "*.tar" -exec tar Af data.tar {} \;
- Change the date in app.yaml, commit and push that change. If you don't do commit before #5 then the upload will fail.
- In a single command process the files and upload them
ca_visit_tracking on master via 🐍 v3.8.5 on ☁️ [email protected](emailstats)
❯ pwd
/Users/andrewjanian/covid/ca_visit_tracking
ca_visit_tracking on master via 🐍 v3.8.5 on ☁️ [email protected](emailstats)
❯ python ./bin/foursquare_cube.py --prevdir=datascratch/20210215/20210215-v0 datascratch/data.tar v0 datascratch/20210218 && ./bin/foursquare_load.sh datascratch/20210218/20210218-v0 20210218-v0 && iphone "visitdata uploaded" "20210218" && make deploy-prod-quiet && make deploy-beta-quiet && iphone "visitdata deployed" "20210218"
- This depends on you having the prior day's data in your scratch directory
- This uploads ~6GB of data so it takes a while
- In my command to process and upload the files there is a function I have called iphone which sends me a notification via pushover.net. It doesn't have an impact on the processing or upload but let's me know that it is done so I can check
- Because I've done this so many times I use the quite versions of the deployment make targets so they don't ask me questions. If you're starting it may be better to use the regular (non-quiet) targets.
To deploy the app to visitdata.org:
-
Install the latest gcloud SDK: https://cloud.google.com/sdk/docs
-
Login using a Google account:
$ gcloud auth login
-
Set the default project:
$ gcloud config set project os-covid
Run:
$ cd .../ca_visit_tracking
$ make deploy-beta
Run:
$ cd .../ca_visit_tracking
$ make deploy-prod
Extract, Transform and Load (ETL) Airflow operators and DAGs in this repository
under etl/
. These can be run locally or on the production server.
See the etl/README.md
for details.