I've got a simple example books table with an integer id primary key ("books_pkey" PRIMARY KEY, btree (id)) and 100,000,000 random rows. If I run:
EXPLAIN
SELECT *
FROM books
ORDER BY id
OFFSET 99999999
LIMIT 1;
I see a query plan like:
Limit (cost=3137296.54..3137296.57 rows=1 width=14)
-> Index Scan using books_pkey on books (cost=0.57..3137296.57 rows=100000000 width=14)
Do I understand correctly that PostgreSQL is loading 100000000 rows into memory, only for the OFFSET to discard all but 1? If so, why can't it do the "load and discard" step using the index and only load one row into memory?
I understand that the typical solution to this is to use keyset pagination - to say WHERE id > x. I'm just trying to understand why an index alone doesn't solve this. Adding another index which is explicitly sorted the same way as this query (CREATE INDEX books_id_ordered ON books (id ASC)) makes no difference to EXPLAIN.
EXPLAIN (ANALYZE, BUFFERS)to see a clearer picture. – mustaccio Aug 13 '21 at 19:29LIMIT/OFFSETtypically does not work correctly at all under concurrent write load. Related: https://dba.stackexchange.com/a/205286/3684 – Erwin Brandstetter Aug 14 '21 at 16:43