0

I have a query that look like this:

SELECT
  DISTINCT "articles"."id",
  "articles"."company_uuid",
  "articles"."status",
  "articles"."discount_reference",
  "articles"."created_at"
  --- more columns
FROM
  "articles"
  INNER JOIN "customers" ON "customers"."rec" = 'f'
  AND "customers"."article_id" = "articles"."id"
WHERE
  "articles"."type" = 'sending_request'
  AND "articles"."hidden" = 'f'
  AND "articles"."discount_reference" = 'Discount/737251623'
  AND "articles"."api_domain" = 'company'
  AND "articles"."status" IN ('completed', 'active')
  AND (customers.search_text ILIKE unaccent('%verb%'))
ORDER BY
  authored_on DESC NULLS LAST
LIMIT
  20 OFFSET 0;

The first query is slow, but when I reran it is always faster. I do not see a clear answer when I compare plans. I have a gin trigram index for the customers search text.

https://explain.dalibo.com/plan/b11657f576699fa8

And second run

https://explain.dalibo.com/plan/g81h74b9g521g5e7

Is the difference in the IO & Buffers the source of the difference?
I am running on PostgreSQL 14 on RDS.

Mio
  • 641
  • 1
  • 10
  • 22

1 Answers1

3

TL/DR: Cold cache.

If only the first execution is slow it's typically cold cache. And yours is just such a case.

As you can see under IO & Buffers in your image (already made you wonder), the fast query execution has only hits, no reads. Meaning, later executions benefit from the now populated cache.

Typically, more RAM (and more available cache for Postgres accordingly), faster storage, less wasteful DB design, and/or optimized queries help.

If it's just a matter of the very first query being slow, pre-warming the cache might do it. See:

Aside, this query might do better, avoiding the unnecessary and expensive DISTINCT. Does not remove the general issue of cold cache, though.

SELECT a.id, a.status
FROM   articles a
WHERE  a.status IN ('completed', 'active')
AND    EXISTS (
   SELECT FROM customers c
   WHERE  c.article_id = a.id
   AND    c.ehr = 'f'
   AND    c.search_text ILIKE unaccent('%nam%')
   )
ORDER  BY a.created_at DESC NULLS LAST
LIMIT  20
OFFSET 0;

That's assuming created_at is really articles.created_at, and articles.id is the PK.

Erwin Brandstetter
  • 175,982
  • 27
  • 439
  • 600
  • It's probably more than just cold cache; 431 reads totalling just 3.4 MB can hardly account for the approximately 5.5 second difference, unless there's also some "cold storage" effect, this being RDS... – mustaccio May 17 '23 at 00:01
  • @mustaccio: You may have missed the 97 MB of reads in the bitmap heap scan. That's substantial. But yes, the query itself seems problematic, too. Reading 100 MB for just 20 rows of output ... Anyway, I am not digging deeper without proper information for performance questions. – Erwin Brandstetter May 17 '23 at 01:26
  • If DISTINCT is over the primary key of one table and no columns of other tables then in theory it's not expensive, it should just get transformed into the above EXISTS code automatically (because of functional dependencies). Why that didn't happen I'm not sure. Possibly the final sort for the ORDER BY is anyway necessary so it wouldn't have helped. – Charlieface May 17 '23 at 13:24