Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: index daily_stations by day #277

Merged
merged 2 commits into from
Jul 3, 2024
Merged

feat: index daily_stations by day #277

merged 2 commits into from
Jul 3, 2024

Conversation

bajtos
Copy link
Member

@bajtos bajtos commented Jul 1, 2024

Speed up spark-stats queries, they always filter by a date range.

Before - 7.2 seconds:

 ->  Parallel Seq Scan on daily_stations  (cost=0.00..899791.13 rows=2746355 width=144) (actual time=273.312..7516.967 rows=2091389 loops=3)
       Filter: (day = date((now() - '1 day'::interval)))
       Rows Removed by Filter: 8091296

After - 0.8 seconds:

 ->  Parallel Index Scan using daily_stations_day on daily_stations  (cost=0.45..283087.78 rows=2746355 width=144) (actual time=143.526..988.537 rows=2091389 loops=3)
       Index Cond: (day = date((now() - '1 day'::interval)))

I tested various combinations of indexed and included columns. The simplest option - index on the date value only - is performing best.

Links:

Speed up spark-stats queries, they always filter by a date range.

Signed-off-by: Miroslav Bajtoš <[email protected]>
Copy link
Member

@juliangruber juliangruber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏 massive speed up!

@bajtos bajtos enabled auto-merge (squash) July 3, 2024 12:18
@bajtos bajtos merged commit 976de6a into main Jul 3, 2024
6 checks passed
@bajtos bajtos deleted the index-daily-stations branch July 3, 2024 12:19
@bajtos
Copy link
Member Author

bajtos commented Jul 3, 2024

👏 massive speed up!

Thank you 😊

I think it's good to put this into context - the entire query takes more than 30 seconds to complete; this index improves the time from 36s down to 30s.

@@ -0,0 +1 @@
CREATE INDEX daily_stations_day ON daily_stations (day);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I should have added CONCURRENTLY to let Postgres build the index in the background and don't block the startup of the spark-evaluate service 🙈

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The index is live, the migration finished in ~29 second 😅

2024-07-03T12:22:23Z app[e2867541be3e68] cdg [info]Migrating DB schema from version 10 to version 11
2024-07-03T12:22:52Z app[e2867541be3e68] cdg [info]Migrated DB schema to version 11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ✅ done
Development

Successfully merging this pull request may close these issues.

2 participants