Reverse ETL: Extract data from BigQuery tables.
This is a Singer tap that produces JSON-formatted data following the Singer spec.
This tap:
- Pulls table data from Google BigQuery
- Outputs the schema for each resource
- Incrementally pulls data based on the input state
(originally found in the Google API docs)
- Use this wizard to create or select a project in the Google Developers Console and activate the BigQuery API. Click Continue, then Go to credentials.
- On the Add credentials to your project page, click the Cancel button.
- At the top of the page, select the OAuth consent screen tab. Select an Email address, enter a Product name if not already set, and click the Save button.
- Select the Credentials tab, click the Create credentials button and select OAuth client ID.
- Select the application type Other, enter the name "Singer BigQuery Tap", and click the Create button.
- Click OK to dismiss the resulting dialog.
- Click the Download button to the right of the client ID.
- Move this file to your working directory and rename it client_secrets.json.
Create a file called config.json in your working directory, following config.sample.json. The required parameters are the start_datetime and at least one stream (one bigquery table) to copy.
First, make sure Python 3 is installed on your system or follow these installation instructions for Mac or Ubuntu.
tap-bigquery can be run with any Singer Target. As example, let use target-redshift
These commands will install target-redshift and tap-bigquery with pip. Export google client secrets file to auth in Google cloud. Run tap-bigquery in discovery mode to let it create json schema file and then run them together, piping the output of tap-bigquery to target-redshift:
> pip install tap-bigquery pipelinewise-target-redshift
> export GOOGLE_APPLICATION_CREDENTIALS="./client_secret.json"
> tap_bigquery -c config.json -d > catalog.json
> tap_bigquery -c config.json --catalog catalog.json --start_datetime '2020-05-01T00:00:00Z' --end_datetime '2020-05-01T01:00:00Z'
It is recommended to use tap-bigquery
with a service account.
- Download the client_secrets.json file for your service account, and place it on the machine where
tap-bigquery
will be executed. - Set a
GOOGLE_APPLICATION_CREDENTIALS
environment variable on the machine, where the value is the fully qualified path to client_secrets.json
It should be possible to use the oAuth flow to authenticate to GCP as well:
tap-bigquery
will attempt to open a new window or tab in your default browser. If this fails, copy the URL from the console and manually open it in your browser.- If you are not already logged into your Google account, you will be prompted to log in.
- If you are logged into multiple Google accounts, you will be asked to select one account to use for the authorization.
- Click the Accept button to allow
tap-bigquery
to access your Google BigQuery table. - You can close the tab after the signup flow is complete.
The data will be written to the table specified in your config.json
.
https://github.com/anelendata/tap_bigquery
Copyright © 2019 Anelen Data