2

I am doing a pretty complex query in Postgres 9.3.4:

SELECT p.*
FROM   unnest('{19082, 19075, 20705, 18328, 19110, 24965, 18329, 27600
              , 17804, 20717, 27598, 27599}'::int[]) s(source_id)
     , LATERAL (
   SELECT *
   FROM   posts
   WHERE  source_id = s.source_id
   AND    deleted_at IS NULL
   ORDER  BY external_created_at DESC
   LIMIT  100
   ) p
ORDER  BY p.external_created_at DESC
LIMIT  100;

See here for more details on my Postgres db, and settings, etc.

Which is great, unless I try to do some pagination.
To see what I mean. When I do a count, with no limit set:

SELECT p.*
FROM   unnest('{19082, 19075, 20705, 18328, 19110, 24965, 18329, 27600
              , 17804, 20717, 27598, 27599}'::int[]) s(source_id)
     , LATERAL (
   SELECT *
   FROM   posts
   WHERE  source_id = s.source_id
   AND    deleted_at IS NULL
   ORDER  BY external_created_at DESC
   ) p
ORDER  BY p.external_created_at DESC;

I get (107 rows), which is the correct number of results.

If I want to paginate, I'd use OFFSET on the subquery. If the first page has 100 posts, the second page should have 7 posts.

SELECT p.*
FROM   unnest('{19082, 19075, 20705, 18328, 19110, 24965, 18329, 27600
              , 17804, 20717, 27598, 27599}'::int[]) s(source_id)
     , LATERAL (
   SELECT *
   FROM   posts
   WHERE  source_id = s.source_id
   AND    deleted_at IS NULL
   ORDER  BY external_created_at DESC
   LIMIT  100
   OFFSET 100
   ) p
ORDER  BY p.external_created_at DESC
LIMIT  100;

but instead it returns (0 rows).

If I do the pagination on the outer query:

SELECT p.*
FROM   unnest('{19082, 19075, 20705, 18328, 19110, 24965, 18329, 27600
              , 17804, 20717, 27598, 27599}'::int[]) s(source_id)
     , LATERAL (
   SELECT *
   FROM   posts
   WHERE  source_id = s.source_id
   AND    deleted_at IS NULL
   ORDER  BY external_created_at DESC
   ) p
ORDER  BY p.external_created_at DESC
LIMIT  100
OFFSET 100;

I get (7 rows) which is correct, but then the query is insanely slow (see this question)

Is there a way to paginate this query properly, not slow it down, and get the correct results back?

goddamnyouryan
  • 239
  • 2
  • 9

1 Answers1

2

For just the two pages, a compromise could be:

SELECT p.*
FROM   unnest('{19082, 19075, 20705, 18328, 19110, 24965, 18329, 27600
              , 17804, 20717, 27598, 27599}'::int[]) s(source_id)
     , LATERAL (
   SELECT *
   FROM   posts
   WHERE  source_id = s.source_id
   AND    deleted_at IS NULL
   ORDER  BY external_created_at DESC
   LIMIT  200   -- 100 + 100 (outer limit + offset) per source_id
   ) p
ORDER  BY p.external_created_at DESC
OFFSET 100
LIMIT  100;

Which will be considerably slower than the first 100 rows, but still much faster than your original query in your previous question.

For just 107 rows total in your example, performance will be almost identical.

Performance decreases with every later page, obviously. OFFSET is a pain for performance, generally. But it becomes a PITA when combined with your difficult query.

I would consider a materialized view with pre-calculated row numbers or the more complex (but also more potent) solution here:

Also related:

Erwin Brandstetter
  • 175,982
  • 27
  • 439
  • 600
  • For this line LIMIT 200 -- 100 + 100 per source_id to cover all cases. wouldn't it be 100 * page_number, so a limit of 200 for page 2, a limit of 300 for page 3, etc? Otherwise, in this example wouldn't it be a LIMIT of 1200 since there are 12 source_ids in the query? – goddamnyouryan Jan 12 '16 at 03:56
  • 1
    @goddamnyouryan: I clarified a bit. The inner LIMIT is per source_id, that's the nature of a LATERAL subquery. – Erwin Brandstetter Jan 12 '16 at 04:01
  • By per source_id you don't mean multiply by the number of source id's, but rather the limit is on the source_id query, correct? Sorry! – goddamnyouryan Jan 12 '16 at 04:03
  • 1
    @goddamnyouryan: Read this to understand LATERAL: http://stackoverflow.com/a/28557803/939860 – Erwin Brandstetter Jan 12 '16 at 04:07