Streaming exports #1828

ballPointPenguin · 2024-10-24T04:27:13Z

I reverted this PR and re-opened it.
Somehow these changes caused the heroku deploy to be unable to connect to postgres.

And add a "streaming" API for making database queries, which streams the results from the database to Node as they are generated by Postgres. This allows Node to process the rows one by one (and garbage collect in between), which is much easier on the VM when we need to do big queries that summarize data (or just format it and incrementally spit it out an HTTP response).

This moves the handle_GET_reportExport route into its own file, which necessitated refactoring some other things (zinvite and pca) out of server.ts as well. Chipping away at the monolith. This also converts the votes.csv report to use the streaming query from Postgres, which is mostly a smoke test. It seems to work, so next I'll convert it to stream the results incrementally to the HTTP response as well.

There was actually a bug in the old SQL that aggregated votes from _all_ conversations instead of just the conversation in question, which is why it took 30 seconds to run. With that bug fixed, even the super slow "do a full subquery for each comment row" was actually quite fast. But this is way cheaper/faster.

And add a "streaming" API for making database queries, which streams the results from the database to Node as they are generated by Postgres. This allows Node to process the rows one by one (and garbage collect in between), which is much easier on the VM when we need to do big queries that summarize data (or just format it and incrementally spit it out an HTTP response).

This moves the handle_GET_reportExport route into its own file, which necessitated refactoring some other things (zinvite and pca) out of server.ts as well. Chipping away at the monolith. This also converts the votes.csv report to use the streaming query from Postgres, which is mostly a smoke test. It seems to work, so next I'll convert it to stream the results incrementally to the HTTP response as well.

There was actually a bug in the old SQL that aggregated votes from _all_ conversations instead of just the conversation in question, which is why it took 30 seconds to run. With that bug fixed, even the super slow "do a full subquery for each comment row" was actually quite fast. But this is way cheaper/faster.

In the raw votes table, -1 means agree and 1 means disagree, so we need to count things correctly. And when exporting votes in participant votes, we flip the sign so that 1 means agree and -1 means disagree.

…g-exports

ballPointPenguin · 2024-10-24T04:28:03Z

from heroku log:

2024-10-24T03:51:12.271040+00:00 app[web.1]: �[31merror�[39m: pg_connect_pool_fail no pg_hba.conf entry for host "3.80.159.156", user "[REDACTED]", database "[REDACTED]", no encryption {"code":"28000","file":"auth.c","length":172,"line":"559","name":"error","routine":"ClientAuthentication","service":"polis-api-server","severity":"FATAL","stack":"error: no pg_hba.conf entry for host "3.80.159.156", user "[REDACTED]", database "[REDACTED]", no encryption\n at Parser.parseErrorMessage (/app/node_modules/pg-protocol/src/parser.ts:369:69)\n at Parser.handlePacket (/app/node_modules/pg-protocol/src/parser.ts:188:21)\n at Parser.parse (/app/node_modules/pg-protocol/src/parser.ts:103:30)\n at Socket. (/app/node_modules/pg-protocol/src/index.ts:7:48)\n at Socket.emit (node:events:517:28)\n at Socket.emit (node:domain:489:12)\n at addChunk (node:internal/streams/readable:368:12)\n at readableAddChunk (node:internal/streams/readable:341:9)\n at Socket.Readable.push (node:internal/streams/readable:278:10)\n at TCP.onStreamRead (node:internal/stream_base_commons:190:23)\n at TCP.callbackTrampoline (node:internal/async_hooks:128:17)","timestamp":"2024-10-24 03:51:12 +00:00"}

ballPointPenguin · 2024-10-25T01:24:40Z

Closing in favor of #1829

samskivert and others added 14 commits September 19, 2024 13:27

Split each report into separate function.

cf0faa4

Add participant-votes.csv export.

9063469

Split each report into separate function.

0c81603

Add participant-votes.csv export.

9563246

Flip vote polarity.

9b68710

In the raw votes table, -1 means agree and 1 means disagree, so we need to count things correctly. And when exporting votes in participant votes, we flip the sign so that 1 means agree and -1 means disagree.

Properly escape comment text.

8e627dd

add votes matrix, show data license preprod, logging.

297471e

Merge remote-tracking branch 'origin/streaming-exports' into streamin…

30ef85e

…g-exports

ballPointPenguin closed this Oct 25, 2024

ballPointPenguin deleted the streaming-exports branch November 15, 2024 07:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming exports #1828

Streaming exports #1828

ballPointPenguin commented Oct 24, 2024

ballPointPenguin commented Oct 24, 2024 •

edited

Loading

ballPointPenguin commented Oct 25, 2024

Streaming exports #1828

Streaming exports #1828

Conversation

ballPointPenguin commented Oct 24, 2024

ballPointPenguin commented Oct 24, 2024 • edited Loading

ballPointPenguin commented Oct 25, 2024

ballPointPenguin commented Oct 24, 2024 •

edited

Loading