We had information from new relic that one our query has a bad performance (almost 16 seconds for execution). We use PostgreSQL 11.8 on Debian Linux.
Table:
postgres=# SELECT table_name, column_name, data_type, character_maximum_length, character_octet_length, numeric_precision, numeric_precision_radix, numeric_scale, is_nullable, is_updatable FROM information_schema.columns WHERE table_name = 'store_codeinventory';
table_name | column_name | data_type | character_maximum_length | character_octet_length | numeric_precision | numeric_precision_radix | numeric_scale | is_nullable | is_updatable
---------------------+-------------------+--------------------------+--------------------------+------------------------+-------------------+-------------------------+---------------+-------------+--------------
store_codeinventory | id | integer | | | 32 | 2 | 0 | NO | YES
store_codeinventory | code | character varying | 255 | 1020 | | | | NO | YES
store_codeinventory | limit_days | integer | | | 32 | 2 | 0 | NO | YES
store_codeinventory | deactivation_date | timestamp with time zone | | | | | | YES | YES
store_codeinventory | cost | numeric | | | 8 | 10 | 2 | NO | YES
store_codeinventory | price | numeric | | | 8 | 10 | 2 | NO | YES
store_codeinventory | created_date | timestamp with time zone | | | | | | YES | YES
store_codeinventory | claimed_date | timestamp with time zone | | | | | | YES | YES
store_codeinventory | is_active | boolean | | | | | | NO | YES
store_codeinventory | book_id | integer | | | 32 | 2 | 0 | YES | YES
store_codeinventory | code_import_id | integer | | | 32 | 2 | 0 | NO | YES
store_codeinventory | creator_id | integer | | | 32 | 2 | 0 | NO | YES
store_codeinventory | inv_id | integer | | | 32 | 2 | 0 | YES | YES
store_codeinventory | label_id | integer | | | 32 | 2 | 0 | YES | YES
store_codeinventory | recipient_id | integer | | | 32 | 2 | 0 | YES | YES
store_codeinventory | purchase_id | integer | | | 32 | 2 | 0 | YES | YES
(16 rows)
I don't have an access to production, but I tried to fill in local db and understand the problem. Query:
UPDATE store_codeinventory set recipient_id = 1168, claimed_date = NOW()
where id = ANY((select array(select id from store_codeinventory
where recipient_id is NULL and inv_id = 72 and is_active=true
ORDER BY ID ASC LIMIT 1 FOR UPDATE)) ::integer[]) and recipient_id is NULL;
Query plan:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Update on store_codeinventory (cost=0.64..47.43 rows=10 width=92) (actual time=0.291..0.291 rows=0 loops=1)
InitPlan 2 (returns $3)
-> Result (cost=0.34..0.35 rows=1 width=32) (actual time=0.060..0.060 rows=1 loops=1)
InitPlan 1 (returns $2)
-> Limit (cost=0.29..0.34 rows=1 width=10) (actual time=0.058..0.058 rows=1 loops=1)
-> LockRows (cost=0.29..1031.77 rows=19963 width=10) (actual time=0.057..0.057 rows=1 loops=1)
-> Index Scan using store_codeinventory_pkey on store_codeinventory store_codeinventory_1 (cost=0.29..832.14 rows=19963 width=10) (actual time=0.053..0.053 rows=1 loops=1)
Filter: ((recipient_id IS NULL) AND is_active AND (inv_id = 72))
Rows Removed by Filter: 94
-> Index Scan using store_codeinventory_pkey on store_codeinventory (cost=0.29..47.08 rows=10 width=92) (actual time=0.065..0.066 rows=1 loops=1)
Index Cond: (id = ANY ($3))
Filter: (recipient_id IS NULL)
Planning Time: 0.365 ms
Execution Time: 0.327 ms
(14 rows)
I am not so good at writing sql. Could someone give me advice where a bottleneck is? As I understood there might be two places: LockRows and Index Scan, right? I think that the first one was caused FOR UPDATE and the second one - select where ORDER BY ID ASC LIMIT? Is there a way to optimize them?
UPD. information about indexes(I removed columns which didn't use in query):
postgres=# SELECT indexname, indexdef FROM pg_indexes WHERE schemaname = 'public' and tablename = 'store_codeinventory' AND (indexdef LIKE '%inv_id%' OR indexdef LIKE '%pkey%' OR indexdef LIKE '%recipient_id%');
indexname | indexdef
----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------
store_codeinventory_inv_id_recipient_id_is_active_822b92e5_idx | CREATE INDEX store_codeinventory_inv_id_recipient_id_is_active_822b92e5_idx ON public.store_codeinventory USING btree (inv_id, recipient_id, is_active)
store_codeinventory_recipient_id_e32fbb18 | CREATE INDEX store_codeinventory_recipient_id_e32fbb18 ON public.store_codeinventory USING btree (recipient_id)
store_codeinventory_inv_id_21e20eb7 | CREATE INDEX store_codeinventory_inv_id_21e20eb7 ON public.store_codeinventory USING btree (inv_id)
store_codeinventory_pkey | CREATE UNIQUE INDEX store_codeinventory_pkey ON public.store_codeinventory USING btree (id)
(4 rows)
UPD2 I am not so good at sql, but I decided to analyse that query and has concluded that we can simplify it. What you think about this variant:
UPDATE store_codeinventory set recipient_id = 1168, claimed_date = NOW()
where id = (select id from store_codeinventory
where recipient_id is NULL and inv_id = 72 and is_active=true
ORDER BY ID ASC LIMIT 1 FOR UPDATE) and recipient_id is NULL;
I thought that we don't need the array because we will get only one or null in sub-select. Then I thought that LIMIT 1 for ordered by ID ASC it is the same to get minimal value of id. What if we write:
UPDATE store_codeinventory set recipient_id = 1168, claimed_date = NOW()
where id = (select MIN(id) from store_codeinventory
where recipient_id is NULL and inv_id = 72 and is_active=true);
Is it the same or not?
CREATE TABLEstatement would be more useful. It's the canonical form, and it also includes constraints. To make the picture complete, disclose data distribution in involved columns and your concurrency situation. Either way, my answer should already cover typical cases. – Erwin Brandstetter Aug 13 '21 at 22:40