-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add endpoint for participants with highest measurement count #164
feat: add endpoint for participants with highest measurement count #164
Conversation
Have not added tests yet This would go hand-in-hand with a change in spark-evaluate to add this Index:
What do we think about this approach? Because we're using an aggregate function, the index will not be perfectly helpful. The query will still need to run the full summation and at least partial sorting, I think. Alternatively, we could create a new table that just updates the total sum for each station_id. However, I lean towards thinking that would be overkill, considering we already cache queries for 24h, so we are rarely running this query. |
Depends on filecoin-station/spark-evaluate#274 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new version looks pretty good.
I'd like to run the SQL query once we have a full day worth of data to see how expensive the query is.
Preliminary plan:
- Land the spark-evaluate PR tomorrow and start collecting the data
- Wait until Monday
- On Monday, test the SQL query to see how it works in practice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are going in the right direction 👍🏻
stats/test/platform-routes.test.js
Outdated
// We don't care about the date range for this query | ||
const res = await fetch( | ||
new URL( | ||
'/participants/top-measurements?from=2024-01-11&to=2024-01-11', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't care about the date range for this query
As I am arguing in my other comment above, we need to care about the date range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realised that since we don't use the filter at all, maybe this endpoint shouldn't be implemented via respond(pgPools.evaluate, fetchParticipantsWithTopMeasurements)
.
Instead, we could write a custom request handler and configure a caching strategy that makes more sense in this particular case - e.g., we can set TTL until midnight UTC.
Anyhow, it's a relatively minor thing, I am fine to land this pull request as it is now.
The CI build is failing in the dry-run step because this PR does not have access to GLIF_TOKEN secret.
|
Link: space-meridian/roadmap#118