-
Notifications
You must be signed in to change notification settings - Fork 2
Design Documentation
Gabe Sawhney edited this page Apr 8, 2023
·
9 revisions
STATUS: FOR DISCUSSION
When the scraper has finished running:
- For any new Ariba records, or any records for which the size of the ZIP file doesn't match its file size last time the script ran:
- Ariba files (unzipped) have been downloaded and saved to Google Drive in a folder following the format specified below
- The Ariba page is saved as an html file to the same folder
- XML fragment from OCDS data is saved to the same folder
- Data is sent to the DB API (INSERT OR UPDATE IF EXISTS): OCDS data, list of public URLs for file attachments, and parsed text from each file
- For any new OCDS records which haven't already been sent to the DB API: send the data to the DB API
- For any OCDS records which don't match the data we have ion our DB, send the data to the DB API
- A single CSV(?) file containing the ZIP file size for each Ariba record is updated on Google Drive
- Output a log to a new slack channel, including:
- Date/time, "ID" of host running the scraper
- Any errors encountered
- Including records founds in Ariba and not OCDS, or vice versa
- New records added (UID)
- Records/files updated (UID and old + new ZIP file sizes)