I am writing a query to calculate hiscore rankings for a game. It performs a GROUP BY to aggregate hiscore statistics for individual player stats that are stored across 23 different rows (separated by different "skills").
The query I came up with below works:
SELECT * FROM (
SELECT
username,
SUM(level) total_level,
SUM(experience) total_experience,
(SELECT rank FROM overall_hiscore_rankings WHERE username = skill_hiscore.username) rank
FROM skill_hiscore GROUP BY username ORDER BY rank
) a WHERE rank >= 1 AND rank <= 25;
Note: overall_hiscore_rankings is a materialized view and the filter at the end is my implementation of the seek method for pagination to prevent expensive LIMIT OFFSET pagination.
I did not notice any problems with this until seeding about ~30,000 players worth of data into my hiscores schema. Because each player has 23 different rows as mentioned above, this leaves us with a table with 690,000 rows.
This now gives us atrocious performance, literally taking my db 100 seconds to execute this query:
Sort (cost=35926899.17..35926907.50 rows=3330 width=35) (actual time=98599.592..98599.595 rows=25 loops=1)
Sort Key: ((SubPlan 1))
Sort Method: quicksort Memory: 26kB
-> GroupAggregate (cost=0.42..35926704.35 rows=3330 width=35) (actual time=149.023..98599.514 rows=25 loops=1)
Group Key: skill_hiscore.username
Filter: (((SubPlan 2) >= 1) AND ((SubPlan 3) <= 25))
Rows Removed by Filter: 29978
-> Index Scan using skill_hiscore_username on skill_hiscore (cost=0.42..46884.88 rows=690063 width=19) (actual time=144.644..293.859 rows=690063 loops=1)
SubPlan 1
-> Seq Scan on overall_hiscore_rankings (cost=0.00..567.04 rows=1 width=8) (actual time=0.002..1.592 rows=1 loops=25)
Filter: ((username)::text = (skill_hiscore.username)::text)
Rows Removed by Filter: 30002
SubPlan 2
-> Seq Scan on overall_hiscore_rankings overall_hiscore_rankings_1 (cost=0.00..567.04 rows=1 width=8) (actual time=0.817..1.633 rows=1 loops=30003)
Filter: ((username)::text = (skill_hiscore.username)::text)
Rows Removed by Filter: 30002
SubPlan 3
-> Seq Scan on overall_hiscore_rankings overall_hiscore_rankings_2 (cost=0.00..567.04 rows=1 width=8) (actual time=0.816..1.633 rows=1 loops=30003)
Filter: ((username)::text = (skill_hiscore.username)::text)
Rows Removed by Filter: 30002
Planning Time: 2.820 ms
JIT:
Functions: 21
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 1.248 ms, Inlining 44.936 ms, Optimization 60.995 ms, Emission 38.520 ms, Total 145.700 ms
Execution Time: 98623.550 ms
Clearly, the ORDER BY and GROUP BY is causing the largest performance drain, but I am not sure how to rewrite this.
Nothing pops out at my that indicates this query would take 100 seconds - I have never seen a query perform so poorly, and frankly trying to optimize this is out of my knowledge.
This query is not backed by any indexes.
My main question is this: Is the lack of an index on this query for the GROUP BY and ORDER BY data enough to make the runtime of this query 100 seconds?
usernamecolumn in both tables for that. – Mar 02 '22 at 06:44