Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add top earnings participant endpoint #170

Conversation

PatrickNercessian
Copy link
Collaborator

@PatrickNercessian PatrickNercessian marked this pull request as ready for review July 2, 2024 14:51
observer/test/observer.test.js Outdated Show resolved Hide resolved
Comment on lines 61 to 79
export const fetchTopEarningParticipants = async (pgPool, filter) => {
const { rows } = await pgPool.query(`
WITH latest_scheduled_rewards AS (
SELECT DISTINCT ON (participant_address) participant_address, scheduled_rewards
FROM daily_scheduled_rewards
ORDER BY participant_address, day DESC
)
SELECT
COALESCE(drt.to_address, lsr.participant_address) as participant_address,
COALESCE(SUM(drt.amount), 0) + COALESCE(lsr.scheduled_rewards, 0) as total_rewards
FROM daily_reward_transfers drt
FULL OUTER JOIN latest_scheduled_rewards lsr
ON drt.to_address = lsr.participant_address
WHERE drt.day >= $1 AND drt.day <= $2 OR drt.day IS NULL
GROUP BY COALESCE(drt.to_address, lsr.participant_address), lsr.scheduled_rewards
ORDER BY total_rewards DESC
`, [filter.from, filter.to])
return rows
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another set of eyes here would be nice for this complex query.

And also if we can run it on production data to see latency expectations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spark_stats=# EXPLAIN WITH latest_scheduled_rewards AS (
      SELECT DISTINCT ON (participant_address) participant_address, scheduled_rewards
      FROM daily_scheduled_rewards
      ORDER BY participant_address, day DESC
    )
    SELECT
      COALESCE(drt.to_address, lsr.participant_address) as participant_address,
      COALESCE(SUM(drt.amount), 0) + COALESCE(lsr.scheduled_rewards, 0) as total_rewards
    FROM daily_reward_transfers drt
    FULL OUTER JOIN latest_scheduled_rewards lsr
      ON drt.to_address = lsr.participant_address
    WHERE drt.day >= '2024-06-26' AND drt.day <= '2024-07-03' OR drt.day IS NULL
    GROUP BY COALESCE(drt.to_address, lsr.participant_address), lsr.scheduled_rewards
    ORDER BY total_rewards DESC;
                                                             QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=14296.01..14304.06 rows=3220 width=75)
   Sort Key: ((COALESCE(sum(drt.amount), '0'::numeric) + COALESCE(daily_scheduled_rewards.scheduled_rewards, '0'::numeric))) DESC
   ->  HashAggregate  (cost=14060.10..14108.40 rows=3220 width=75)
         Group Key: COALESCE(drt.to_address, daily_scheduled_rewards.participant_address), daily_scheduled_rewards.scheduled_rewards
         ->  Hash Full Join  (cost=13345.69..14035.95 rows=3220 width=56)
               Hash Cond: (daily_scheduled_rewards.participant_address = drt.to_address)
               Filter: (((drt.day >= '2024-06-26'::date) AND (drt.day <= '2024-07-03'::date)) OR (drt.day IS NULL))
               ->  Unique  (cost=13295.05..13816.97 rows=8980 width=58)
                     ->  Sort  (cost=13295.05..13556.01 rows=104383 width=58)
                           Sort Key: daily_scheduled_rewards.participant_address, daily_scheduled_rewards.day DESC
                           ->  Seq Scan on daily_scheduled_rewards  (cost=0.00..2293.83 rows=104383 width=58)
               ->  Hash  (cost=31.95..31.95 rows=1495 width=60)
                     ->  Seq Scan on daily_reward_transfers drt  (cost=0.00..31.95 rows=1495 width=60)
(13 rows)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the last 7 days, and returning 11k rows, this query took

  Time: 309.534 ms

which is ok for me

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can shave off ~40ms from the query by adding an index ON daily_scheduled_rewards (participant_address, day), but I guess the improvement is small.

I agree that the query duration in the 300-400ms range is acceptable.

EXPLAIN ANALYZE WITH latest_scheduled_rewards AS (
      SELECT DISTINCT ON (participant_address) participant_address, scheduled_rewards
      FROM daily_scheduled_rewards
      ORDER BY participant_address, day DESC
    )
    SELECT
      COALESCE(drt.to_address, lsr.participant_address) as participant_address,
      COALESCE(SUM(drt.amount), 0) + COALESCE(lsr.scheduled_rewards, 0) as total_rewards
    FROM daily_reward_transfers drt
    FULL OUTER JOIN latest_scheduled_rewards lsr
      ON drt.to_address = lsr.participant_address
    WHERE drt.day >= '2024-06-26' AND drt.day <= '2024-07-03' OR drt.day IS NULL
    GROUP BY COALESCE(drt.to_address, lsr.participant_address), lsr.scheduled_rewards
    ORDER BY total_rewards DESC;
                                                                          QUERY PLAN                                                                          
--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=14693.79..14701.93 rows=3256 width=75) (actual time=380.119..381.786 rows=11118 loops=1)
   Sort Key: ((COALESCE(sum(drt.amount), '0'::numeric) + COALESCE(daily_scheduled_rewards.scheduled_rewards, '0'::numeric))) DESC
   Sort Method: quicksort  Memory: 1948kB
   ->  HashAggregate  (cost=14454.98..14503.82 rows=3256 width=75) (actual time=370.194..376.246 rows=11118 loops=1)
         Group Key: COALESCE(drt.to_address, daily_scheduled_rewards.participant_address), daily_scheduled_rewards.scheduled_rewards
         Batches: 5  Memory Usage: 4273kB  Disk Usage: 248kB
         ->  Hash Full Join  (cost=13724.09..14430.56 rows=3256 width=56) (actual time=309.549..364.272 rows=11118 loops=1)
               Hash Cond: (daily_scheduled_rewards.participant_address = drt.to_address)
               Filter: (((drt.day >= '2024-06-26'::date) AND (drt.day <= '2024-07-03'::date)) OR (drt.day IS NULL))
               Rows Removed by Filter: 959
               ->  Unique  (cost=13673.45..14209.91 rows=9082 width=58) (actual time=308.787..359.491 rows=11644 loops=1)
                     ->  Sort  (cost=13673.45..13941.68 rows=107292 width=58) (actual time=308.783..348.916 rows=107429 loops=1)
                           Sort Key: daily_scheduled_rewards.participant_address, daily_scheduled_rewards.day DESC
                           Sort Method: external merge  Disk: 7368kB
                           ->  Seq Scan on daily_scheduled_rewards  (cost=0.00..2344.92 rows=107292 width=58) (actual time=0.012..23.680 rows=107429 loops=1)
               ->  Hash  (cost=31.95..31.95 rows=1495 width=60) (actual time=0.747..0.749 rows=1495 loops=1)
                     Buckets: 2048  Batches: 1  Memory Usage: 151kB
                     ->  Seq Scan on daily_reward_transfers drt  (cost=0.00..31.95 rows=1495 width=60) (actual time=0.040..0.383 rows=1495 loops=1)
 Planning Time: 0.377 ms
 Execution Time: 384.109 ms
(20 rows)

@@ -32,6 +33,8 @@ export const handlePlatformRoutes = async (req, res, pgPools) => {
await respond(pgPools.evaluate, fetchDailyStationAcceptedMeasurementCount)
} else if (req.method === 'GET' && url === '/transfers/daily') {
await respond(pgPools.stats, fetchDailyRewardTransfers)
} else if (req.method === 'GET' && url === '/participants/top-earnings') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
} else if (req.method === 'GET' && url === '/participants/top-earnings') {
} else if (req.method === 'GET' && url === '/participants/top-earning') {

This route is about the top earning participants, and not the top earnings among participants (subtle but I think important)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add this route to the README as well

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, addressed in 633f194

stats/lib/platform-stats-fetchers.js Show resolved Hide resolved
Comment on lines 61 to 79
export const fetchTopEarningParticipants = async (pgPool, filter) => {
const { rows } = await pgPool.query(`
WITH latest_scheduled_rewards AS (
SELECT DISTINCT ON (participant_address) participant_address, scheduled_rewards
FROM daily_scheduled_rewards
ORDER BY participant_address, day DESC
)
SELECT
COALESCE(drt.to_address, lsr.participant_address) as participant_address,
COALESCE(SUM(drt.amount), 0) + COALESCE(lsr.scheduled_rewards, 0) as total_rewards
FROM daily_reward_transfers drt
FULL OUTER JOIN latest_scheduled_rewards lsr
ON drt.to_address = lsr.participant_address
WHERE drt.day >= $1 AND drt.day <= $2 OR drt.day IS NULL
GROUP BY COALESCE(drt.to_address, lsr.participant_address), lsr.scheduled_rewards
ORDER BY total_rewards DESC
`, [filter.from, filter.to])
return rows
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spark_stats=# EXPLAIN WITH latest_scheduled_rewards AS (
      SELECT DISTINCT ON (participant_address) participant_address, scheduled_rewards
      FROM daily_scheduled_rewards
      ORDER BY participant_address, day DESC
    )
    SELECT
      COALESCE(drt.to_address, lsr.participant_address) as participant_address,
      COALESCE(SUM(drt.amount), 0) + COALESCE(lsr.scheduled_rewards, 0) as total_rewards
    FROM daily_reward_transfers drt
    FULL OUTER JOIN latest_scheduled_rewards lsr
      ON drt.to_address = lsr.participant_address
    WHERE drt.day >= '2024-06-26' AND drt.day <= '2024-07-03' OR drt.day IS NULL
    GROUP BY COALESCE(drt.to_address, lsr.participant_address), lsr.scheduled_rewards
    ORDER BY total_rewards DESC;
                                                             QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=14296.01..14304.06 rows=3220 width=75)
   Sort Key: ((COALESCE(sum(drt.amount), '0'::numeric) + COALESCE(daily_scheduled_rewards.scheduled_rewards, '0'::numeric))) DESC
   ->  HashAggregate  (cost=14060.10..14108.40 rows=3220 width=75)
         Group Key: COALESCE(drt.to_address, daily_scheduled_rewards.participant_address), daily_scheduled_rewards.scheduled_rewards
         ->  Hash Full Join  (cost=13345.69..14035.95 rows=3220 width=56)
               Hash Cond: (daily_scheduled_rewards.participant_address = drt.to_address)
               Filter: (((drt.day >= '2024-06-26'::date) AND (drt.day <= '2024-07-03'::date)) OR (drt.day IS NULL))
               ->  Unique  (cost=13295.05..13816.97 rows=8980 width=58)
                     ->  Sort  (cost=13295.05..13556.01 rows=104383 width=58)
                           Sort Key: daily_scheduled_rewards.participant_address, daily_scheduled_rewards.day DESC
                           ->  Seq Scan on daily_scheduled_rewards  (cost=0.00..2293.83 rows=104383 width=58)
               ->  Hash  (cost=31.95..31.95 rows=1495 width=60)
                     ->  Seq Scan on daily_reward_transfers drt  (cost=0.00..31.95 rows=1495 width=60)
(13 rows)

Comment on lines 61 to 79
export const fetchTopEarningParticipants = async (pgPool, filter) => {
const { rows } = await pgPool.query(`
WITH latest_scheduled_rewards AS (
SELECT DISTINCT ON (participant_address) participant_address, scheduled_rewards
FROM daily_scheduled_rewards
ORDER BY participant_address, day DESC
)
SELECT
COALESCE(drt.to_address, lsr.participant_address) as participant_address,
COALESCE(SUM(drt.amount), 0) + COALESCE(lsr.scheduled_rewards, 0) as total_rewards
FROM daily_reward_transfers drt
FULL OUTER JOIN latest_scheduled_rewards lsr
ON drt.to_address = lsr.participant_address
WHERE drt.day >= $1 AND drt.day <= $2 OR drt.day IS NULL
GROUP BY COALESCE(drt.to_address, lsr.participant_address), lsr.scheduled_rewards
ORDER BY total_rewards DESC
`, [filter.from, filter.to])
return rows
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the last 7 days, and returning 11k rows, this query took

  Time: 309.534 ms

which is ok for me

FROM daily_reward_transfers drt
FULL OUTER JOIN latest_scheduled_rewards lsr
ON drt.to_address = lsr.participant_address
WHERE drt.day >= $1 AND drt.day <= $2 OR drt.day IS NULL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never remember the precedence of AND & OR operators. Could you please add parenthesis to make it explicit?

For example:

WHERE (drt.day >= $1 AND drt.day <= $2) OR drt.day IS NULL

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! 633f194

stats/lib/platform-stats-fetchers.js Show resolved Hide resolved
Comment on lines 61 to 79
export const fetchTopEarningParticipants = async (pgPool, filter) => {
const { rows } = await pgPool.query(`
WITH latest_scheduled_rewards AS (
SELECT DISTINCT ON (participant_address) participant_address, scheduled_rewards
FROM daily_scheduled_rewards
ORDER BY participant_address, day DESC
)
SELECT
COALESCE(drt.to_address, lsr.participant_address) as participant_address,
COALESCE(SUM(drt.amount), 0) + COALESCE(lsr.scheduled_rewards, 0) as total_rewards
FROM daily_reward_transfers drt
FULL OUTER JOIN latest_scheduled_rewards lsr
ON drt.to_address = lsr.participant_address
WHERE drt.day >= $1 AND drt.day <= $2 OR drt.day IS NULL
GROUP BY COALESCE(drt.to_address, lsr.participant_address), lsr.scheduled_rewards
ORDER BY total_rewards DESC
`, [filter.from, filter.to])
return rows
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can shave off ~40ms from the query by adding an index ON daily_scheduled_rewards (participant_address, day), but I guess the improvement is small.

I agree that the query duration in the 300-400ms range is acceptable.

EXPLAIN ANALYZE WITH latest_scheduled_rewards AS (
      SELECT DISTINCT ON (participant_address) participant_address, scheduled_rewards
      FROM daily_scheduled_rewards
      ORDER BY participant_address, day DESC
    )
    SELECT
      COALESCE(drt.to_address, lsr.participant_address) as participant_address,
      COALESCE(SUM(drt.amount), 0) + COALESCE(lsr.scheduled_rewards, 0) as total_rewards
    FROM daily_reward_transfers drt
    FULL OUTER JOIN latest_scheduled_rewards lsr
      ON drt.to_address = lsr.participant_address
    WHERE drt.day >= '2024-06-26' AND drt.day <= '2024-07-03' OR drt.day IS NULL
    GROUP BY COALESCE(drt.to_address, lsr.participant_address), lsr.scheduled_rewards
    ORDER BY total_rewards DESC;
                                                                          QUERY PLAN                                                                          
--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=14693.79..14701.93 rows=3256 width=75) (actual time=380.119..381.786 rows=11118 loops=1)
   Sort Key: ((COALESCE(sum(drt.amount), '0'::numeric) + COALESCE(daily_scheduled_rewards.scheduled_rewards, '0'::numeric))) DESC
   Sort Method: quicksort  Memory: 1948kB
   ->  HashAggregate  (cost=14454.98..14503.82 rows=3256 width=75) (actual time=370.194..376.246 rows=11118 loops=1)
         Group Key: COALESCE(drt.to_address, daily_scheduled_rewards.participant_address), daily_scheduled_rewards.scheduled_rewards
         Batches: 5  Memory Usage: 4273kB  Disk Usage: 248kB
         ->  Hash Full Join  (cost=13724.09..14430.56 rows=3256 width=56) (actual time=309.549..364.272 rows=11118 loops=1)
               Hash Cond: (daily_scheduled_rewards.participant_address = drt.to_address)
               Filter: (((drt.day >= '2024-06-26'::date) AND (drt.day <= '2024-07-03'::date)) OR (drt.day IS NULL))
               Rows Removed by Filter: 959
               ->  Unique  (cost=13673.45..14209.91 rows=9082 width=58) (actual time=308.787..359.491 rows=11644 loops=1)
                     ->  Sort  (cost=13673.45..13941.68 rows=107292 width=58) (actual time=308.783..348.916 rows=107429 loops=1)
                           Sort Key: daily_scheduled_rewards.participant_address, daily_scheduled_rewards.day DESC
                           Sort Method: external merge  Disk: 7368kB
                           ->  Seq Scan on daily_scheduled_rewards  (cost=0.00..2344.92 rows=107292 width=58) (actual time=0.012..23.680 rows=107429 loops=1)
               ->  Hash  (cost=31.95..31.95 rows=1495 width=60) (actual time=0.747..0.749 rows=1495 loops=1)
                     Buckets: 2048  Batches: 1  Memory Usage: 151kB
                     ->  Seq Scan on daily_reward_transfers drt  (cost=0.00..31.95 rows=1495 width=60) (actual time=0.040..0.383 rows=1495 loops=1)
 Planning Time: 0.377 ms
 Execution Time: 384.109 ms
(20 rows)

@bajtos bajtos enabled auto-merge (squash) July 4, 2024 07:41
@bajtos bajtos merged commit cc5336d into filecoin-station:main Jul 4, 2024
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants