Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add endpoint for participants with highest measurement count #164

Merged

Conversation

PatrickNercessian
Copy link
Collaborator

@PatrickNercessian PatrickNercessian commented Jun 24, 2024

@PatrickNercessian
Copy link
Collaborator Author

Have not added tests yet

This would go hand-in-hand with a change in spark-evaluate to add this Index:

CREATE INDEX stations_day_accepted_count ON daily_stations (day, accepted_count DESC);

What do we think about this approach? Because we're using an aggregate function, the index will not be perfectly helpful. The query will still need to run the full summation and at least partial sorting, I think.

Alternatively, we could create a new table that just updates the total sum for each station_id. However, I lean towards thinking that would be overkill, considering we already cache queries for 24h, so we are rarely running this query.

stats/lib/platform-stats-fetchers.js Outdated Show resolved Hide resolved
stats/lib/platform-routes.js Outdated Show resolved Hide resolved
stats/lib/platform-stats-fetchers.js Outdated Show resolved Hide resolved
stats/lib/platform-stats-fetchers.js Outdated Show resolved Hide resolved
@PatrickNercessian
Copy link
Collaborator Author

Depends on filecoin-station/spark-evaluate#274

Copy link
Member

@bajtos bajtos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new version looks pretty good.

I'd like to run the SQL query once we have a full day worth of data to see how expensive the query is.

Preliminary plan:

  • Land the spark-evaluate PR tomorrow and start collecting the data
  • Wait until Monday
  • On Monday, test the SQL query to see how it works in practice

stats/lib/platform-stats-fetchers.js Outdated Show resolved Hide resolved
stats/lib/platform-stats-fetchers.js Outdated Show resolved Hide resolved
stats/test/platform-routes.test.js Outdated Show resolved Hide resolved
stats/test/platform-routes.test.js Outdated Show resolved Hide resolved
@PatrickNercessian PatrickNercessian marked this pull request as draft July 1, 2024 14:56
stats/bin/spark-stats.js Outdated Show resolved Hide resolved
@bajtos bajtos changed the title feat: add endpoint for stations with highest measurement count feat: add endpoint for participants with highest measurement count Jul 3, 2024
Copy link
Member

@bajtos bajtos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are going in the right direction 👍🏻

stats/lib/platform-stats-fetchers.js Show resolved Hide resolved
Comment on lines 178 to 181
// We don't care about the date range for this query
const res = await fetch(
new URL(
'/participants/top-measurements?from=2024-01-11&to=2024-01-11',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't care about the date range for this query

As I am arguing in my other comment above, we need to care about the date range.

@PatrickNercessian PatrickNercessian marked this pull request as ready for review July 3, 2024 15:20
stats/lib/platform-routes.js Outdated Show resolved Hide resolved
stats/lib/platform-stats-fetchers.js Show resolved Hide resolved
stats/lib/platform-stats-fetchers.js Show resolved Hide resolved
Copy link
Member

@bajtos bajtos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realised that since we don't use the filter at all, maybe this endpoint shouldn't be implemented via respond(pgPools.evaluate, fetchParticipantsWithTopMeasurements).

Instead, we could write a custom request handler and configure a caching strategy that makes more sense in this particular case - e.g., we can set TTL until midnight UTC.

Anyhow, it's a relatively minor thing, I am fine to land this pull request as it is now.

:shipit:

@bajtos
Copy link
Member

bajtos commented Jul 4, 2024

The CI build is failing in the dry-run step because this PR does not have access to GLIF_TOKEN secret.

Error: server response 401 Unauthorized (
request={  }, 
response={  }, 
error=null, 
info={ "requestUrl": "https://api.node.glif.io/rpc/v0", "responseBody": "<html>\r\n<head><title>401 Authorization Required</title></head>\r\n<body>\r\n<center><h1>401 Authorization Required</h1></center>\r\n</body>\r\n</html>\r\n", "responseStatus": "401 Unauthorized" }, 
code=SERVER_ERROR, 
version=6.13.1
)

@bajtos bajtos merged commit 0b56565 into filecoin-station:main Jul 4, 2024
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants