Skip to content

Commit

Permalink
Streaming exports (compdemocracy#1826)
Browse files Browse the repository at this point in the history
* Switch to non-native Postgres client.

And add a "streaming" API for making database queries, which streams the
results from the database to Node as they are generated by Postgres.

This allows Node to process the rows one by one (and garbage collect in
between), which is much easier on the VM when we need to do big queries that
summarize data (or just format it and incrementally spit it out an HTTP
response).

* Mostly refactoring.

This moves the handle_GET_reportExport route into its own file, which
necessitated refactoring some other things (zinvite and pca) out of server.ts
as well. Chipping away at the monolith.

This also converts the votes.csv report to use the streaming query from
Postgres, which is mostly a smoke test. It seems to work, so next I'll convert
it to stream the results incrementally to the HTTP response as well.

* Split each report into separate function.

* Count up comment votes in single pass over votes table.

There was actually a bug in the old SQL that aggregated votes from _all_
conversations instead of just the conversation in question, which is why it
took 30 seconds to run. With that bug fixed, even the super slow "do a full
subquery for each comment row" was actually quite fast. But this is way
cheaper/faster.

* Add participant-votes.csv export.

* Switch to non-native Postgres client.

And add a "streaming" API for making database queries, which streams the
results from the database to Node as they are generated by Postgres.

This allows Node to process the rows one by one (and garbage collect in
between), which is much easier on the VM when we need to do big queries that
summarize data (or just format it and incrementally spit it out an HTTP
response).

* Mostly refactoring.

This moves the handle_GET_reportExport route into its own file, which
necessitated refactoring some other things (zinvite and pca) out of server.ts
as well. Chipping away at the monolith.

This also converts the votes.csv report to use the streaming query from
Postgres, which is mostly a smoke test. It seems to work, so next I'll convert
it to stream the results incrementally to the HTTP response as well.

* Split each report into separate function.

* Count up comment votes in single pass over votes table.

There was actually a bug in the old SQL that aggregated votes from _all_
conversations instead of just the conversation in question, which is why it
took 30 seconds to run. With that bug fixed, even the super slow "do a full
subquery for each comment row" was actually quite fast. But this is way
cheaper/faster.

* Add participant-votes.csv export.

* Flip vote polarity.

In the raw votes table, -1 means agree and 1 means disagree, so we need to
count things correctly. And when exporting votes in participant votes, we flip
the sign so that 1 means agree and -1 means disagree.

* Properly escape comment text.

* add votes matrix, show data license preprod, logging.

---------

Co-authored-by: Michael Bayne <[email protected]>
  • Loading branch information
colinmegill and samskivert authored Oct 22, 2024
1 parent c60752b commit 61d2940
Show file tree
Hide file tree
Showing 9 changed files with 976 additions and 647 deletions.
67 changes: 40 additions & 27 deletions client-report/src/components/overview.js
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,9 @@ const Number = ({ number, label }) => (

const pathname = window.location.pathname; // "/report/2arcefpshi"
const report_id = pathname.split("/")[2];
const doShowDataLicenseTerms = ["pol.is", "preprod.pol.is", "localhost"].includes(
window.location.hostname
);

const getCurrentTimestamp = () => {
const now = new Date();
Expand Down Expand Up @@ -147,6 +150,16 @@ const Overview = ({
</a>
{` (as event log)`}
</p>
<p style={{ fontFamily: "monospace" }}>
{`---Votes matrix: `}
<a
download={getDownloadFilename("participant-votes", conversation)}
href={`http://${window.location.hostname}/api/v3/reportExport/${report_id}/participant-votes.csv`}
>
{getDownloadFilename("participant-votes", conversation)}
</a>
{` (as comments x participants matrix)`}
</p>
<div style={{ marginTop: "3em" }}>
<p style={{ fontFamily: "monospace" }}>
<strong>Public API endpoints (read only, Jupyter notebook friendly)</strong>
Expand All @@ -160,36 +173,36 @@ const Overview = ({
<p style={{ fontFamily: "monospace" }}>
{`$ curl http://${window.location.hostname}/api/v3/reportExport/${report_id}/votes.csv`}
</p>
<p style={{ fontFamily: "monospace" }}>
{`$ curl http://${window.location.hostname}/api/v3/reportExport/${report_id}/participant-votes.csv`}
</p>
</div>
{window.location.hostname === "pol.is" ||
(window.location.hostname === "localhost" && (
<div style={{ marginTop: "3em" }}>
<p style={{ fontFamily: "monospace" }}>
<strong>Attribution of Polis Data</strong>
</p>

<p style={{ fontFamily: "monospace" }}>
All Polis data is licensed under a Creative Commons Attribution 4.0 International
license: https://creativecommons.org/licenses/by/4.0/
</p>
<p style={{ fontFamily: "monospace" }}>
--------------- BEGIN STATEMENT ---------------
</p>
<p
style={{ fontFamily: "monospace" }}
>{`Data was gathered using the Polis software (see: compdemocracy.org/polis and github.com/compdemocracy/polis) and is sub-licensed
{doShowDataLicenseTerms && (
<div style={{ marginTop: "3em" }}>
<p style={{ fontFamily: "monospace" }}>
<strong>Attribution of Polis Data</strong>
</p>

<p style={{ fontFamily: "monospace" }}>
All Polis data is licensed under a Creative Commons Attribution 4.0 International
license: https://creativecommons.org/licenses/by/4.0/
</p>
<p style={{ fontFamily: "monospace" }}>
--------------- BEGIN STATEMENT ---------------
</p>
<p
style={{ fontFamily: "monospace" }}
>{`Data was gathered using the Polis software (see: compdemocracy.org/polis and github.com/compdemocracy/polis) and is sub-licensed
under CC BY 4.0 with Attribution to The Computational Democracy Project. The data and more
information about how the data was collected can be found at the following link: ${window.location.href}`}</p>
<p style={{ fontFamily: "monospace" }}>
--------------- END STATEMENT---------------
</p>
<p style={{ fontFamily: "monospace" }}>
For further information on best practices for Attribution of CC 4.0 licensed content
Please see:
https://wiki.creativecommons.org/wiki/Best_practices_for_attribution#Title.2C_Author.2C_Source.2C_License
</p>
</div>
))}
<p style={{ fontFamily: "monospace" }}>--------------- END STATEMENT---------------</p>
<p style={{ fontFamily: "monospace" }}>
For further information on best practices for Attribution of CC 4.0 licensed content
Please see:
https://wiki.creativecommons.org/wiki/Best_practices_for_attribution#Title.2C_Author.2C_Source.2C_License
</p>
</div>
)}
</div>
</div>
);
Expand Down
113 changes: 99 additions & 14 deletions server/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion server/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@
"p3p": "~0.0.2",
"pg": "~8.8.0",
"pg-connection-string": "~2.5.0",
"pg-native": "~3.0.1",
"pg-query-stream": "^4.6.0",
"replacestream": "~4.0.0",
"request": "~2.88.2",
"request-promise": "~4.2.6",
Expand Down
Loading

0 comments on commit 61d2940

Please sign in to comment.