Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming exports #1826

Merged
merged 14 commits into from
Oct 22, 2024
Merged

Streaming exports #1826

merged 14 commits into from
Oct 22, 2024

Conversation

colinmegill
Copy link
Member

Add streaming exports for participant votes matrix direct download.

Add logging to callback handler to evaluate older report missing rid error on preproduction.

samskivert and others added 14 commits September 19, 2024 13:27
And add a "streaming" API for making database queries, which streams the
results from the database to Node as they are generated by Postgres.

This allows Node to process the rows one by one (and garbage collect in
between), which is much easier on the VM when we need to do big queries that
summarize data (or just format it and incrementally spit it out an HTTP
response).
This moves the handle_GET_reportExport route into its own file, which
necessitated refactoring some other things (zinvite and pca) out of server.ts
as well. Chipping away at the monolith.

This also converts the votes.csv report to use the streaming query from
Postgres, which is mostly a smoke test. It seems to work, so next I'll convert
it to stream the results incrementally to the HTTP response as well.
There was actually a bug in the old SQL that aggregated votes from _all_
conversations instead of just the conversation in question, which is why it
took 30 seconds to run. With that bug fixed, even the super slow "do a full
subquery for each comment row" was actually quite fast. But this is way
cheaper/faster.
And add a "streaming" API for making database queries, which streams the
results from the database to Node as they are generated by Postgres.

This allows Node to process the rows one by one (and garbage collect in
between), which is much easier on the VM when we need to do big queries that
summarize data (or just format it and incrementally spit it out an HTTP
response).
This moves the handle_GET_reportExport route into its own file, which
necessitated refactoring some other things (zinvite and pca) out of server.ts
as well. Chipping away at the monolith.

This also converts the votes.csv report to use the streaming query from
Postgres, which is mostly a smoke test. It seems to work, so next I'll convert
it to stream the results incrementally to the HTTP response as well.
There was actually a bug in the old SQL that aggregated votes from _all_
conversations instead of just the conversation in question, which is why it
took 30 seconds to run. With that bug fixed, even the super slow "do a full
subquery for each comment row" was actually quite fast. But this is way
cheaper/faster.
In the raw votes table, -1 means agree and 1 means disagree, so we need to
count things correctly. And when exporting votes in participant votes, we flip
the sign so that 1 means agree and -1 means disagree.
@colinmegill colinmegill added ⚒️ infrastructure Re: automation, continuous integration. 🔩 p:client-report labels Oct 22, 2024
@colinmegill colinmegill merged commit 61d2940 into edge Oct 22, 2024
4 checks passed
@colinmegill colinmegill deleted the streaming-exports branch October 22, 2024 20:32
ballPointPenguin added a commit that referenced this pull request Oct 24, 2024
@ballPointPenguin ballPointPenguin restored the streaming-exports branch October 24, 2024 04:25
@ballPointPenguin ballPointPenguin deleted the streaming-exports branch November 15, 2024 07:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⚒️ infrastructure Re: automation, continuous integration. 🔩 p:client-report
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants