1

We run a database on

PostgreSQL 12.6 on x86_64-pc-linux-gnu, compiled by Debian clang version 10.0.1, 64-bit

We are trying to get the top rows of a result set where the most recent entry has a status of 'Failed'.

Main table is product_tracking:

                                          Table "public.product_tracking"
    Column     |           Type           | Collation | Nullable | Default | Storage  | Stats target | Description
---------------+--------------------------+-----------+----------+---------+----------+--------------+-------------
 product_id    | character varying(512)   |           | not null |         | extended |              |
 delivery_name | character varying(512)   |           | not null |         | extended |              |
 feed_gid      | uuid                     |           | not null |         | plain    |              |
 status        | status                   |           | not null |         | plain    |              |
 updated_at    | timestamp with time zone |           |          | now()   | plain    |              |
 errors        | text                     |           |          |         | extended |              |
 created_at    | timestamp with time zone |           |          |         | plain    |              |
Indexes:
    "product_tracking_pkey" PRIMARY KEY, btree (product_id, delivery_name, feed_gid)
    "product_tracking_created_at_idx" btree (created_at)
    "product_tracking_created_at_product_idx" btree (created_at, product_id, delivery_name, feed_gid)
    "product_tracking_delivery_name_idx" btree (delivery_name)
    "product_tracking_feed_gid_idx" btree (feed_gid)
    "product_tracking_product_id_idx" btree (product_id)

We most recently added the index product_tracking_created_at_product_idx in hopes that this would speed up the ordering of the new query.

Original query and plan:

explain analyze
WITH required_feeds AS (
    SELECT gid, name feed_name
    FROM feeds
)
SELECT product_id, feed_gid, errors
FROM product_tracking pt1
         INNER JOIN required_feeds f ON (pt1.feed_gid = f.gid)
WHERE updated_at = (SELECT MAX(updated_at)
                    FROM product_tracking pt2
                    WHERE pt1.product_id = pt2.product_id
                      AND pt1.feed_gid = pt2.feed_gid)
  AND status = 'Failed'
ORDER BY created_at desc, product_id desc, delivery_name desc, feed_gid desc
LIMIT 100
Limit  (cost=0.56..23433928.19 rows=100 width=138) (actual time=0.245..4.107 rows=100 loops=1)
  ->  Nested Loop  (cost=0.56..111311156.82 rows=475 width=138) (actual time=0.244..4.092 rows=100 loops=1)
        Join Filter: (pt1.feed_gid = feeds.gid)
        Rows Removed by Join Filter: 4888
        ->  Index Scan Backward using product_tracking_created_at_product_idx on product_tracking pt1  (cost=0.56..111309426.64 rows=475 width=138) (actual time=0.210..3.190 rows=100 loops=1)
              Filter: ((status = 'Failed'::status) AND (updated_at = (SubPlan 1)))
              Rows Removed by Filter: 1515
              SubPlan 1
                ->  Aggregate  (cost=8.65..8.66 rows=1 width=8) (actual time=0.017..0.017 rows=1 loops=100)
                      ->  Index Scan using product_tracking_pkey on product_tracking pt2  (cost=0.56..8.65 rows=1 width=8) (actual time=0.013..0.016 rows=1 loops=100)
                            Index Cond: (((product_id)::text = (pt1.product_id)::text) AND (feed_gid = pt1.feed_gid))
        ->  Materialize  (cost=0.00..6.64 rows=243 width=16) (actual time=0.000..0.004 rows=50 loops=100)
              ->  Seq Scan on feeds  (cost=0.00..5.43 rows=243 width=16) (actual time=0.009..0.054 rows=234 loops=1)
Planning Time: 1.855 ms
Execution Time: 4.238 ms

With a filter on required_feeds we get an execution plan that looks like this. I'm using feed1 and feed2 as an example, in reality these feeds would change.

explain analyze
WITH required_feeds AS (
    SELECT gid, name feed_name
    FROM feeds
    WHERE name in ('feed1', 'feed2')
)
SELECT product_id, feed_gid, errors
FROM product_tracking pt1
         INNER JOIN required_feeds f ON (pt1.feed_gid = f.gid)
WHERE updated_at = (SELECT MAX(updated_at)
                    FROM product_tracking pt2
                    WHERE pt1.product_id = pt2.product_id
                      AND pt1.feed_gid = pt2.feed_gid)
  AND status = 'Failed'
ORDER BY created_at desc, product_id desc, delivery_name desc, feed_gid desc
LIMIT 100
Limit  (cost=1485702.73..1485702.74 rows=4 width=138) (actual time=2265.019..2265.047 rows=100 loops=1)
  ->  Sort  (cost=1485702.73..1485702.74 rows=4 width=138) (actual time=2265.018..2265.034 rows=100 loops=1)
        Sort Key: pt1.created_at DESC, pt1.product_id DESC, pt1.delivery_name DESC, pt1.feed_gid DESC
        Sort Method: top-N heapsort  Memory: 72kB
        ->  Nested Loop  (cost=2183.11..1485702.69 rows=4 width=138) (actual time=269.623..2260.551 rows=13814 loops=1)
              ->  Seq Scan on feeds  (cost=0.00..6.04 rows=2 width=16) (actual time=0.013..0.069 rows=2 loops=1)
                    Filter: (name = ANY {pkinteractive,iok}'::text[]))
                    Rows Removed by Filter: 243
              ->  Bitmap Heap Scan on product_tracking pt1  (cost=2183.11..742848.30 rows=3 width=138) (actual time=142.994..1127.950 rows=6907 loops=2)
                    Recheck Cond: (feed_gid = feeds.gid)
                    Rows Removed by Index Recheck: 2814781
                    Filter: ((status = 'Failed'::status) AND (updated_at = (SubPlan 1)))
                    Rows Removed by Filter: 1751784
                    Heap Blocks: exact=99595 lossy=98823
                    ->  Bitmap Index Scan on product_tracking_feed_gid_idx  (cost=0.00..2183.11 rows=71806 width=0) (actual time=124.666..124.666 rows=1799676 loops=2)
                          Index Cond: (feed_gid = feeds.gid)
                    SubPlan 1
                      ->  Aggregate  (cost=8.65..8.66 rows=1 width=8) (actual time=0.013..0.013 rows=1 loops=16149)
                            ->  Index Scan using product_tracking_pkey on product_tracking pt2  (cost=0.56..8.65 rows=1 width=8) (actual time=0.012..0.012 rows=2 loops=16149)
                                  Index Cond: (((product_id)::text = (pt1.product_id)::text) AND (feed_gid = pt1.feed_gid))
Planning Time: 1.840 ms
Execution Time: 2265.242 ms

Possibly useful statistics:

SELECT count(*) AS row_count,
       avg(length(product_id)) AS avg_prod_len,
       avg(length(delivery_name)) AS avg_delivery_len,
       count(*) FILTER (WHERE status = 'Failed') AS ct_failed,
       count(DISTINCT (product_id)) AS distinct_product_id,
       count(DISTINCT (delivery_name)) AS distinct_delivery_name,
       count(DISTINCT (feed_gid)) AS distinct_feed_gid,
       count(DISTINCT (status)) AS distinct_status,
       count(DISTINCT (updated_at)) AS distinct_updated_at,
       count(DISTINCT (errors)) AS distinct_errors,
       count(DISTINCT (created_at)) AS distinct_created_at
FROM product_tracking;

row_count = 11601030 avg_prod_len = 12.48 avg_delivery_len = 17.298 ct_failed = 74881 distinct_product_id = 8638613 distinct_delivery_name = 7315794 distinct_feed_gid = 245 distinct_status = 3 distinct_updated_at = 9096954 distinct_errors = 16664 distinct_created_at = 8772269

Table feeds has only 245 rows with all values currently unique.

What's causing the slow down and what suggestions do you have?

Additional comments:

feed_gid sadly does have to be a UUID, this is a mapping from a downstream system we do not own. status can be 'Failed', 'Pending' or 'Succeeded', so a boolean is not quite possible. There are several transactions against this table every minute throughout the day so reordering or adapting the table design is out of the question for the short term.


Thanks, Erwin. Due to your input I managed to change my query to:

SELECT * FROM product_tracking pt1
JOIN feeds f ON (f.gid = feed_gid)
WHERE updated_at = (SELECT MAX(updated_at)
                    FROM product_tracking pt2
                    WHERE pt1.feed_gid IN (
                        SELECT gid
                        from feeds
                        where name IN ('feed1')
                    )
                      AND pt1.product_id = pt2.product_id
                      AND pt1.feed_gid = pt2.feed_gid
)
  AND status = 'Failed'
ORDER BY created_at desc, product_id desc, delivery_name desc, feed_gid desc
LIMIT 100

Limit (cost=0.56..23431662.26 rows=100 width=176) (actual time=1599.557..4797.615 rows=4 loops=1) Buffers: shared hit=12319324 -> Nested Loop (cost=0.56..112003343.48 rows=478 width=176) (actual time=1599.556..4797.610 rows=4 loops=1) Join Filter: (pt1.feed_gid = f.gid) Rows Removed by Join Filter: 432 Buffers: shared hit=12319324 -> Index Scan Backward using product_tracking_created_at_product_idx on product_tracking pt1 (cost=0.56..112001602.36 rows=478 width=150) (actual time=1599.483..4797.469 rows=4 loops=1) Filter: ((status = 'Failed'::status) AND (updated_at = (SubPlan 2))) Rows Removed by Filter: 12781499 Buffers: shared hit=12319323 SubPlan 2 -> Aggregate (cost=8.65..8.66 rows=1 width=8) (actual time=0.000..0.001 rows=1 loops=96451) Buffers: shared hit=227 -> Result (cost=0.56..8.65 rows=1 width=8) (actual time=0.000..0.000 rows=0 loops=96451) One-Time Filter: (hashed SubPlan 1) Buffers: shared hit=227 -> Index Scan using product_tracking_pkey on product_tracking pt2 (cost=0.56..8.65 rows=1 width=8) (actual time=0.027..0.030 rows=3 loops=31) Index Cond: (((product_id)::text = (pt1.product_id)::text) AND (feed_gid = pt1.feed_gid)) Buffers: shared hit=224 SubPlan 1 -> Seq Scan on feeds (cost=0.00..6.04 rows=1 width=16) (actual time=0.018..0.035 rows=1 loops=1) Filter: (name = 'absolute'::text) Rows Removed by Filter: 244 Buffers: shared hit=3 -> Materialize (cost=0.00..6.64 rows=243 width=26) (actual time=0.007..0.020 rows=109 loops=4) Buffers: shared hit=1 -> Seq Scan on feeds f (cost=0.00..5.43 rows=243 width=26) (actual time=0.021..0.031 rows=109 loops=1) Buffers: shared hit=1 Planning Time: 0.583 ms Execution Time: 4797.695 ms

This causes queries to be super fast if the index doesn't need to scan too far down product_tracking_created_at_product_idx. For a feed that has relatively recent failures, the query is about half a second. For any feed that has queries from over a year ago, the query can take up to 5 seconds (like this shared query).

Erwin Brandstetter
  • 175,982
  • 27
  • 439
  • 600
Josh Laird
  • 113
  • 5
  • 1
    I adapted my suggested query to the statistics you delivered. – Erwin Brandstetter Jun 17 '21 at 13:03
  • You're an absolute hero. I was looking at a partial index over status, but the query you provided looks much cleaner. Lots of reading to do here, and loving those edits that keep coming in! thank you – Josh Laird Jun 17 '21 at 13:20
  • 1
    Cool! Would be interested in resulting query performance. – Erwin Brandstetter Jun 17 '21 at 13:25
  • Interestingly, the first time I ran your updated query with the partial index and a WHERE clause it took 5s, immediately after it took 2s, now it's taking 0.2s consistently. Is it possible that there is some kind of cache causing data to be sped up subsequently? It makes it difficult to benchmark the query! Super fast, though

    Perhaps it's me running EXPLAIN ANALYZE with ANALYZE speeding up subsequent queries against that table?

    – Josh Laird Jun 17 '21 at 16:08
  • If I wanted to keep this particular query's cache hot, could I hypothetically create a scheduled job that executes SELECT * FROM product_tracking WHERE status = 'Failed' to keep the cache fresh for this query? I have read other posts of yours and seen that there is no way to not query the cache. I suppose the best way to saturate the cache is to run a query in a few hours when this query hasn't been run? – Josh Laird Jun 17 '21 at 16:15
  • 1
    Definitely caching effects (among other artifacts like the concurrnent load you mentioned). Best compare with saturated cache. More RAM and/or increased resource settings can help to keep more of your db cached. And optimized data types, column order, and indexes like I already suggested! – Erwin Brandstetter Jun 17 '21 at 16:18
  • 1
    That's an option. With high accumulated cost and potential adverse effects on other tasks. The system has its reasons it evicts your data pages from cache. What I suggested earlier might be a better cure. – Erwin Brandstetter Jun 17 '21 at 16:25
  • Learnt a lot these past 2 days, can't thank you enough for your time – Josh Laird Jun 17 '21 at 16:27

1 Answers1

3

There can be many reasons for the switched query plan. A very bad plan typically indicates inaccurate column statistics and / or cost constants. It starts with the bad estimate for the index scan on product_tracking_feed_gid_idx:

(cost=0.00..2183.11 rows=71806 width=0) (actual time=124.666..124.666 rows=1799676 loops=2)

Produces many more rows than Postgres expected. And it does not stop there. There are countless related answers about ANALYZE and statistics and cost settings. Here is a starter:

On top of that, the combination of filter on one table, join to another, filter some more (getting max for one column), order by other columns and then LIMIT is notoriously hard to plan. Depending on details of data distribution very different query plans can be preferable. With inaccurate cost settings and column statistics (especially with uneven data distribution) this can go south quickly. Related:

Beware of cached query plans for tricky queries like this one, like for prepared statements or queries nested in PL/pgSQL functions. Normally, Postgres wouldn't fall for that, but if the first couple of calls work fine with the same query plan, it might. (Dynamic SQL with EXECUTE can be a workaround for PL/pgSQL.)

Faster query

After seeing the value frequencies you added, I expect this equivalent query to perform much better:

SELECT product_id, feed_gid, errors
FROM   product_tracking  pt1
WHERE  status = 'Failed'  -- very selective (< 1% qualify)
AND    feed_gid IN (SELECT gid FROM feeds WHERE name IN ('feed1', 'feed2'))  -- also selective
AND    NOT EXISTS (  -- no later update
   SELECT FROM product_tracking pt2
   WHERE  pt2.product_id = pt1.product_id
   AND    pt2.feed_gid   = pt1.feed_gid
   AND    pt2.updated_at > pt1.updated_at
   )
ORDER  BY created_at DESC, product_id DESC, delivery_name DESC, feed_gid DESC
LIMIT  100;

I removed the pointless CTE.

The NOT EXISTS is constructed to allow multiple rows with the same updated_at to qualify, like your original.

Crucially, add this (small in size!) multicolumn partial index to make it fast:

CREATE INDEX product_tracking_failed_idx ON product_tracking (created_at DESC, product_id DESC, delivery_name DESC, feed_gid DESC)
WHERE status = 'Failed';

It replaces your existing index product_tracking_created_at_product_idx for the purpose of this query (also tailored to the outer ORDER BY clause), but it's much smaller and faster for the purpose.

Maybe this simpler index will do the job, too. (Test it!):

CREATE INDEX product_tracking_failed_idx ON product_tracking (created_at DESC)
WHERE status = 'Failed';

And this index to support NOT EXISTS:

CREATE INDEX product_tracking_feed_gid_product_id_updated_at_idx
ON product_tracking (feed_gid, product_id, updated_at DESC NULLS LAST);

I see additional potential in optimizing column types, column order and indexes of your table.
If updated_at and created_at are never NULL, they should be declared NOT NULL.

As for:

status can be Failed, Pending or Succeeded so a boolean is not quite possible

Possible, too:

succeeded boolean
   true - 'Succeeded'
   false - 'Failed'
   NULL - 'Pending'

1 byte instead of 7 - 10 bytes, faster and simpler in queries.

But that's beyond the scope of this question - and not an option anyway, apparently.

Erwin Brandstetter
  • 175,982
  • 27
  • 439
  • 600