EXPLAIN shows that PostgreSQL chooses a bad plan for hash join. it should have first iterate by tags_name_value_height2 in the table tags and use it to find corresponding transactions.id, but instead Postgres does a stupid reverse: it first iterates using transactions_id on transactions.
It's despite I do have all needed (composite) indexes.
How to make it work the right way?
explain
select *
from "transactions"
join tags on transactions.id = tags.tx_id
where "tags"."name" = 'ZGF0YWJhc2U'
and "tags"."value" in ('c29sYXJ3ZWF2ZS10ZXN0bmV0LWFyY2hpdmU')
limit 2;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Limit (cost=246323.53..246710.80 rows=2 width=1964)
-> Hash Join (cost=246323.53..12515067.96 rows=63359 width=1964)
Hash Cond: ((transactions.id)::text = (tags.tx_id)::text)
-> Index Scan using transactions_id on transactions (cost=0.56..10146177.49 rows=3822241 width=1803)
-> Hash (cost=244045.99..244045.99 rows=63359 width=161)
-> Index Scan using tags_name_value_height2 on tags (cost=0.94..244045.99 rows=63359 width=161)
Index Cond: ((name = 'ZGF0YWJhc2U'::text) AND (value = 'c29sYXJ3ZWF2ZS10ZXN0bmV0LWFyY2hpdmU'::text))
JIT:
Functions: 13
Options: Inlining false, Optimization false, Expressions true, Deforming true
(10 rows)
Postgre 13. Maybe I should downngrade? Which version has good query hints?
arweave=# \d transactions
Table "arweave.transactions"
Column | Type | Collation | Nullable | Default
---------------+--------------------------+-----------+----------+-------------------------------------------
id | character varying(64) | | not null |
owner | text | | |
tags | jsonb | | |
target | character varying(64) | | |
quantity | text | | |
reward | text | | |
signature | text | | |
last_tx | text | | |
data_size | bigint | | |
content_type | character varying(255) | | |
format | integer | | |
height | integer | | |
owner_address | character varying(255) | | |
data_root | character varying(64) | | |
parent | character varying(64) | | |
created_at | timestamp with time zone | | | CURRENT_TIMESTAMP
app | character varying(64) | | |
domain | character varying(64) | | |
namespace | character varying(64) | | |
seq | integer | | not null | nextval('transactions_seq_seq'::regclass)
Indexes:
"transactions_pkey" PRIMARY KEY, btree (seq)
"id_key" UNIQUE CONSTRAINT, btree (id)
"index_app_transactions" btree (app)
"index_domain_transactions" btree (domain)
"index_namespace_transactions" btree (namespace)
"owner1" btree (seq, owner)
"owner2" btree (seq DESC, owner)
"transactions_height2" btree (height DESC, id DESC)
"transactions_id" btree (id)
"transactions_id_index" UNIQUE CONSTRAINT, btree (height, id)
Referenced by:
TABLE "tags" CONSTRAINT "tag_tx_id" FOREIGN KEY (height, tx_id) REFERENCES transactions(height, id)
TABLE "tags" CONSTRAINT "tags_tx_id_fkey" FOREIGN KEY (tx_id) REFERENCES transactions(id)
arweave=# \d tags
Table "arweave.tags"
Column | Type | Collation | Nullable | Default
---------------+--------------------------+-----------+----------+-----------------------------------
tx_id | character varying(64) | | not null |
owner_address | text | | |
height | integer | | |
index | integer | | not null |
name | text | | |
value | text | | |
created_at | timestamp with time zone | | | CURRENT_TIMESTAMP
seq | integer | | not null | nextval('tags_seq_seq'::regclass)
Indexes:
"tags_pkey" PRIMARY KEY, btree (seq)
"tags_created_at" btree (created_at)
"tags_height1" btree (height, tx_id)
"tags_height2" btree (height DESC, tx_id DESC)
"tags_name_height1" btree (name, height, tx_id)
"tags_name_height2" btree (name, height DESC, tx_id DESC)
"tags_name_value_height2" btree (name, value, height DESC, tx_id DESC)
"tags_tx_id" btree (tx_id)
Foreign-key constraints:
"tag_tx_id" FOREIGN KEY (height, tx_id) REFERENCES transactions(height, id)
"tags_tx_id_fkey" FOREIGN KEY (tx_id) REFERENCES transactions(id)
(Note that the above are not yet very well constructed, but this should not affect this question.)
transactions_idontransactions" -- what led you to this conclusion? By the way, have you tried an index ontags (name, value, tx_id)? – mustaccio Aug 06 '21 at 17:42LIMITwithoutORDER BY. Is that intended? And you haveSELECT *, returning all columns fromtransactionsandtags. Is that intended? Please start with proper table and index definitions (CREATE TABLE ...) and tell us what the query is supposed to achieve exactly. – Erwin Brandstetter Aug 06 '21 at 17:52"tags_name_height1" btree (name, height, tx_id). My conclusion is from to things: 1. it's irreasonable slowness; 2.Index Scan using transactions_id on transactionsbeing first in the explain. – porton Aug 06 '21 at 18:15*is for testing. I added the definitions. I try to make this mock query similar to real queries fast. I query a "document" with a certain name/value. – porton Aug 06 '21 at 18:18tagstotransactionsand many duplicated and rows intagsthat didn't refer to correspondingtransactions. This was the reason of slowness. Should I remove my question? – porton Aug 06 '21 at 18:20"tags_tx_id_fkey" FOREIGN KEY (tx_id) REFERENCES transactions(id)- and use the same as join condition. Both arecharacter varying(64)which makes for needlessly big indexes and is generally inefficient. Use the PKtransaction.seq(integer) as target of the FK and query instead. Much more efficient. Also,character varying(255)is typically a misunderstanding in Postgres. See https://dba.stackexchange.com/a/137949/3684 – Erwin Brandstetter Aug 06 '21 at 22:02