Optimize UPDATE query for single "next" row

Question

We had information from new relic that one our query has a bad performance (almost 16 seconds for execution). We use PostgreSQL 11.8 on Debian Linux.

Table:

postgres=# SELECT table_name, column_name, data_type, character_maximum_length, character_octet_length, numeric_precision, numeric_precision_radix, numeric_scale, is_nullable, is_updatable FROM information_schema.columns WHERE table_name = 'store_codeinventory';
     table_name      |    column_name    |        data_type         | character_maximum_length | character_octet_length | numeric_precision | numeric_precision_radix | numeric_scale | is_nullable | is_updatable 
---------------------+-------------------+--------------------------+--------------------------+------------------------+-------------------+-------------------------+---------------+-------------+--------------
 store_codeinventory | id                | integer                  |                          |                        |                32 |                       2 |             0 | NO          | YES
 store_codeinventory | code              | character varying        |                      255 |                   1020 |                   |                         |               | NO          | YES
 store_codeinventory | limit_days        | integer                  |                          |                        |                32 |                       2 |             0 | NO          | YES
 store_codeinventory | deactivation_date | timestamp with time zone |                          |                        |                   |                         |               | YES         | YES
 store_codeinventory | cost              | numeric                  |                          |                        |                 8 |                      10 |             2 | NO          | YES
 store_codeinventory | price             | numeric                  |                          |                        |                 8 |                      10 |             2 | NO          | YES
 store_codeinventory | created_date      | timestamp with time zone |                          |                        |                   |                         |               | YES         | YES
 store_codeinventory | claimed_date      | timestamp with time zone |                          |                        |                   |                         |               | YES         | YES
 store_codeinventory | is_active         | boolean                  |                          |                        |                   |                         |               | NO          | YES
 store_codeinventory | book_id           | integer                  |                          |                        |                32 |                       2 |             0 | YES         | YES
 store_codeinventory | code_import_id    | integer                  |                          |                        |                32 |                       2 |             0 | NO          | YES
 store_codeinventory | creator_id        | integer                  |                          |                        |                32 |                       2 |             0 | NO          | YES
 store_codeinventory | inv_id            | integer                  |                          |                        |                32 |                       2 |             0 | YES         | YES
 store_codeinventory | label_id          | integer                  |                          |                        |                32 |                       2 |             0 | YES         | YES
 store_codeinventory | recipient_id      | integer                  |                          |                        |                32 |                       2 |             0 | YES         | YES
 store_codeinventory | purchase_id       | integer                  |                          |                        |                32 |                       2 |             0 | YES         | YES
(16 rows)

I don't have an access to production, but I tried to fill in local db and understand the problem. Query:

UPDATE store_codeinventory set recipient_id = 1168, claimed_date = NOW()
    where id = ANY((select array(select id from store_codeinventory
        where recipient_id is NULL and inv_id = 72 and is_active=true
        ORDER BY ID ASC LIMIT 1 FOR UPDATE)) ::integer[]) and recipient_id is NULL;

Query plan:

                                                                                              QUERY PLAN                                                                                               
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Update on store_codeinventory  (cost=0.64..47.43 rows=10 width=92) (actual time=0.291..0.291 rows=0 loops=1)
   InitPlan 2 (returns $3)
     ->  Result  (cost=0.34..0.35 rows=1 width=32) (actual time=0.060..0.060 rows=1 loops=1)
           InitPlan 1 (returns $2)
             ->  Limit  (cost=0.29..0.34 rows=1 width=10) (actual time=0.058..0.058 rows=1 loops=1)
                   ->  LockRows  (cost=0.29..1031.77 rows=19963 width=10) (actual time=0.057..0.057 rows=1 loops=1)
                         ->  Index Scan using store_codeinventory_pkey on store_codeinventory store_codeinventory_1  (cost=0.29..832.14 rows=19963 width=10) (actual time=0.053..0.053 rows=1 loops=1)
                               Filter: ((recipient_id IS NULL) AND is_active AND (inv_id = 72))
                               Rows Removed by Filter: 94
   ->  Index Scan using store_codeinventory_pkey on store_codeinventory  (cost=0.29..47.08 rows=10 width=92) (actual time=0.065..0.066 rows=1 loops=1)
         Index Cond: (id = ANY ($3))
         Filter: (recipient_id IS NULL)
 Planning Time: 0.365 ms
 Execution Time: 0.327 ms
(14 rows)

I am not so good at writing sql. Could someone give me advice where a bottleneck is? As I understood there might be two places: LockRows and Index Scan, right? I think that the first one was caused FOR UPDATE and the second one - select where ORDER BY ID ASC LIMIT? Is there a way to optimize them?

UPD. information about indexes(I removed columns which didn't use in query):

postgres=# SELECT indexname, indexdef FROM pg_indexes WHERE schemaname = 'public' and tablename = 'store_codeinventory' AND (indexdef LIKE '%inv_id%' OR indexdef LIKE '%pkey%' OR indexdef LIKE '%recipient_id%');
                           indexname                            |                                                                        indexdef                                                                         
----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------
 store_codeinventory_inv_id_recipient_id_is_active_822b92e5_idx | CREATE INDEX store_codeinventory_inv_id_recipient_id_is_active_822b92e5_idx ON public.store_codeinventory USING btree (inv_id, recipient_id, is_active)
 store_codeinventory_recipient_id_e32fbb18                      | CREATE INDEX store_codeinventory_recipient_id_e32fbb18 ON public.store_codeinventory USING btree (recipient_id)
 store_codeinventory_inv_id_21e20eb7                            | CREATE INDEX store_codeinventory_inv_id_21e20eb7 ON public.store_codeinventory USING btree (inv_id)
 store_codeinventory_pkey                                       | CREATE UNIQUE INDEX store_codeinventory_pkey ON public.store_codeinventory USING btree (id)
(4 rows)

UPD2 I am not so good at sql, but I decided to analyse that query and has concluded that we can simplify it. What you think about this variant:

UPDATE store_codeinventory set recipient_id = 1168, claimed_date = NOW()
    where id = (select id from store_codeinventory
        where recipient_id is NULL and inv_id = 72 and is_active=true 
        ORDER BY ID ASC LIMIT 1 FOR UPDATE) and recipient_id is NULL;

I thought that we don't need the array because we will get only one or null in sub-select. Then I thought that LIMIT 1 for ordered by ID ASC it is the same to get minimal value of id. What if we write:

UPDATE store_codeinventory set recipient_id = 1168, claimed_date = NOW()
    where id = (select MIN(id) from store_codeinventory
        where recipient_id is NULL and inv_id = 72 and is_active=true);

Is it the same or not?

The added table information is useful, but most of it is noise. A minimal, complete CREATE TABLE statement would be more useful. It's the canonical form, and it also includes constraints. To make the picture complete, disclose data distribution in involved columns and your concurrency situation. Either way, my answer should already cover typical cases. — Erwin Brandstetter, Aug 13 '21 at 22:40

score 2 · Accepted Answer · answered Aug 13 '21 at 01:45

Your first query is expensive nonsense - like you already found yourself. That array served no purpose.

The second query makes more sense.

The idea behind the third query is based on a misunderstanding. ORDER BY id LIMIT 1 is at least as efficient as min(id). Both can use an index. The first is more versatile. Among other things, you can add a locking clause like FOR UPDATE - which you need to be safe if there can be concurrent write access.

So:

UPDATE store_codeinventory
SET    recipient_id = 1168
     , claimed_date = now()
WHERE  id = (
   SELECT id
   FROM   store_codeinventory
   WHERE  recipient_id IS NULL
   AND    inv_id = 72
   AND    is_active
   ORDER  BY id
   LIMIT  1
   FOR    UPDATE SKIP LOCKED  -- !?
   );

This should be fast with index support. You need just the right index. Any additional index makes writes to the table more expensive. So add indexes you actually need and avoid others.

The best index depends on the respective characteristics of, and data distribution in, the involved columns:

recipient_id
inv_id
is_active
id

Example: if recipient_id has very few NULL values, a partial index would snap:

CREATE INDEX ON store_codeinventory (id)
WHERE recipient_id IS NULL;

Or, if predicates are always the same, this index would be perfect:

CREATE INDEX ON store_codeinventory (id)
WHERE  recipient_id IS NULL
AND    inv_id = 72
AND    is_active;

Add SKIP LOCKED to the locking clause only for multiple transactions competing for the next "free" id - which looks to be our use case.

See (closely related!):

Postgres UPDATE ... LIMIT 1

Optimize UPDATE query for single "next" row

1 Answers1