I have a large table of locations. I would like to efficient paginate though the table. I had previously being using an OFFSET approach however the size of the table made that approach unusable. So I am now trying a cursor approach using the location id.
In order to ensure consistent ordering for cases where a user has 2 rows with identical timestamp, I am also ordering by id.
SELECT *
FROM locations
WHERE
user_id = 1
ORDER BY timestamp desc, id
LIMIT 100;
However after adding id to the ORDER BY, the query has become slow. It is now doing a Seq Scan which takes ~20 seconds.
QUERY PLAN
Limit (cost=502534.86..502535.11 rows=100 width=152) (actual
time=22822.113..22822.142 rows=100 loops=1)
-> Sort (cost=502534.86..515512.80 rows=5191175 width=152) (actual time=22822.110..22822.133 rows=100 loops=1)
Sort Key: ""timestamp"" DESC, id"
Sort Method: top-N heapsort Memory: 51kB
-> Seq Scan on locations (cost=0.00..304131.89 rows=5191175 width=152) (actual time=1.603..21284.908 rows=5169237 loops=1)
Filter: (user_id = 1)
Rows Removed by Filter: 3048468
Planning time: 0.204 ms
Execution time: 22822.194 ms
Timestamp collisions are edge cases and id is a primary key. So why does the execution plan require a Seq Scan?
For context
SELECT indexdef
FROM pg_indexes
WHERE tablename = 'locations'
results
CREATE UNIQUE INDEX locations_pkey ON locations USING btree (id)
CREATE INDEX index_locations_on_user_id_and_timestamp ON locations USING btree (user_id, "timestamp")
CREATE INDEX index_locations_on_user_id_and_point ON locations USING gist (user_id, point)
CREATE INDEX index_locations_on_user_id ON locations USING btree (user_id)
CREATE INDEX index_locations_on_user_id_and_timestamp ON locations USING btree (user_id, "timestamp")
CREATE INDEX index_locations_on_user_id_and_timestamp_and_id ON locations USING btree (user_id, "timestamp", id)
SELECT *? – mustaccio Nov 17 '18 at 15:33ORDER BY timestamp descbut super slow withORDER BY timestamp desc, idas is requires aSeq Scan. I'm not sure why it requires aSeq Scanwhenidis a primary key. I even tried adding an extra index which includes(user_id, timestamp, id). Any ideas would be appreciated – Gregology Nov 17 '18 at 22:54EXPLAINof the fast version of the query, with only theORDER BY timestamp desc? I assume that one is using one of your indexes which the slow query is not, not sure exactly why yet. – AdamKG Nov 17 '18 at 23:03(user_id asc, timestamp asc, id asc)(all areascimplicitly if not specified) can't handle aWHERE user_id=1and then anORDER BY timestamp desc, id; for that you need(user_id asc, timestamp desc, id asc). I'm doing some checking to confirm, but try creating the index withdescon thetimestampcolumn. – AdamKG Nov 17 '18 at 23:07EXPLAIN, which is changing the query plan relative to what I'd normally expect. – AdamKG Nov 17 '18 at 23:17DESCto theidorder and it finished in 43ms :D Thank you! – Gregology Nov 18 '18 at 03:22