1

EXPLAIN shows that PostgreSQL chooses a bad plan for hash join. it should have first iterate by tags_name_value_height2 in the table tags and use it to find corresponding transactions.id, but instead Postgres does a stupid reverse: it first iterates using transactions_id on transactions.

It's despite I do have all needed (composite) indexes.

How to make it work the right way?

explain
select *
from "transactions" 
join tags on transactions.id = tags.tx_id
where "tags"."name" = 'ZGF0YWJhc2U'
and "tags"."value" in ('c29sYXJ3ZWF2ZS10ZXN0bmV0LWFyY2hpdmU')
limit 2;
                                                        QUERY PLAN                                                        
--------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=246323.53..246710.80 rows=2 width=1964)
   ->  Hash Join  (cost=246323.53..12515067.96 rows=63359 width=1964)
         Hash Cond: ((transactions.id)::text = (tags.tx_id)::text)
         ->  Index Scan using transactions_id on transactions  (cost=0.56..10146177.49 rows=3822241 width=1803)
         ->  Hash  (cost=244045.99..244045.99 rows=63359 width=161)
               ->  Index Scan using tags_name_value_height2 on tags  (cost=0.94..244045.99 rows=63359 width=161)
                     Index Cond: ((name = 'ZGF0YWJhc2U'::text) AND (value = 'c29sYXJ3ZWF2ZS10ZXN0bmV0LWFyY2hpdmU'::text))
 JIT:
   Functions: 13
   Options: Inlining false, Optimization false, Expressions true, Deforming true
(10 rows)

Postgre 13. Maybe I should downngrade? Which version has good query hints?

arweave=# \d transactions
                                        Table "arweave.transactions"
    Column     |           Type           | Collation | Nullable |                  Default                  
---------------+--------------------------+-----------+----------+-------------------------------------------
 id            | character varying(64)    |           | not null | 
 owner         | text                     |           |          | 
 tags          | jsonb                    |           |          | 
 target        | character varying(64)    |           |          | 
 quantity      | text                     |           |          | 
 reward        | text                     |           |          | 
 signature     | text                     |           |          | 
 last_tx       | text                     |           |          | 
 data_size     | bigint                   |           |          | 
 content_type  | character varying(255)   |           |          | 
 format        | integer                  |           |          | 
 height        | integer                  |           |          | 
 owner_address | character varying(255)   |           |          | 
 data_root     | character varying(64)    |           |          | 
 parent        | character varying(64)    |           |          | 
 created_at    | timestamp with time zone |           |          | CURRENT_TIMESTAMP
 app           | character varying(64)    |           |          | 
 domain        | character varying(64)    |           |          | 
 namespace     | character varying(64)    |           |          | 
 seq           | integer                  |           | not null | nextval('transactions_seq_seq'::regclass)
Indexes:
    "transactions_pkey" PRIMARY KEY, btree (seq)
    "id_key" UNIQUE CONSTRAINT, btree (id)
    "index_app_transactions" btree (app)
    "index_domain_transactions" btree (domain)
    "index_namespace_transactions" btree (namespace)
    "owner1" btree (seq, owner)
    "owner2" btree (seq DESC, owner)
    "transactions_height2" btree (height DESC, id DESC)
    "transactions_id" btree (id)
    "transactions_id_index" UNIQUE CONSTRAINT, btree (height, id)
Referenced by:
    TABLE "tags" CONSTRAINT "tag_tx_id" FOREIGN KEY (height, tx_id) REFERENCES transactions(height, id)
    TABLE "tags" CONSTRAINT "tags_tx_id_fkey" FOREIGN KEY (tx_id) REFERENCES transactions(id)

arweave=# \d tags Table "arweave.tags" Column | Type | Collation | Nullable | Default
---------------+--------------------------+-----------+----------+----------------------------------- tx_id | character varying(64) | | not null | owner_address | text | | | height | integer | | | index | integer | | not null | name | text | | | value | text | | | created_at | timestamp with time zone | | | CURRENT_TIMESTAMP seq | integer | | not null | nextval('tags_seq_seq'::regclass) Indexes: "tags_pkey" PRIMARY KEY, btree (seq) "tags_created_at" btree (created_at) "tags_height1" btree (height, tx_id) "tags_height2" btree (height DESC, tx_id DESC) "tags_name_height1" btree (name, height, tx_id) "tags_name_height2" btree (name, height DESC, tx_id DESC) "tags_name_value_height2" btree (name, value, height DESC, tx_id DESC) "tags_tx_id" btree (tx_id) Foreign-key constraints: "tag_tx_id" FOREIGN KEY (height, tx_id) REFERENCES transactions(height, id) "tags_tx_id_fkey" FOREIGN KEY (tx_id) REFERENCES transactions(id)

(Note that the above are not yet very well constructed, but this should not affect this question.)

porton
  • 745
  • 3
  • 11
  • 26
  • 1
    "it first iterates using transactions_id on transactions" -- what led you to this conclusion? By the way, have you tried an index on tags (name, value, tx_id)? – mustaccio Aug 06 '21 at 17:42
  • You have LIMIT without ORDER BY. Is that intended? And you have SELECT *, returning all columns from transactions and tags. Is that intended? Please start with proper table and index definitions (CREATE TABLE ...) and tell us what the query is supposed to achieve exactly. – Erwin Brandstetter Aug 06 '21 at 17:52
  • @mustaccio Yes, I told I have all the needed indexes including "tags_name_height1" btree (name, height, tx_id). My conclusion is from to things: 1. it's irreasonable slowness; 2. Index Scan using transactions_id on transactions being first in the explain. – porton Aug 06 '21 at 18:15
  • @ErwinBrandstetter This query is just for testing, so ORDER BY is not needed. Again * is for testing. I added the definitions. I try to make this mock query similar to real queries fast. I query a "document" with a certain name/value. – porton Aug 06 '21 at 18:18
  • I understood what was the trouble: There was no relevant FK from tags to transactions and many duplicated and rows in tags that didn't refer to corresponding transactions. This was the reason of slowness. Should I remove my question? – porton Aug 06 '21 at 18:20
  • You have "tags_tx_id_fkey" FOREIGN KEY (tx_id) REFERENCES transactions(id) - and use the same as join condition. Both are character varying(64) which makes for needlessly big indexes and is generally inefficient. Use the PK transaction.seq (integer) as target of the FK and query instead. Much more efficient. Also, character varying(255) is typically a misunderstanding in Postgres. See https://dba.stackexchange.com/a/137949/3684 – Erwin Brandstetter Aug 06 '21 at 22:02

1 Answers1

1

You are misreading the EXPLAIN output. Top nodes are executed last. In fact, the index tags_name_value_height2 on tags is read first.

Imagine a tree. The more each node is indented, the earlier it is executed. Details in the manual in the chapter Using EXPLAIN.

Your query might be optimized, but we'd need proper definitions to say more.

Erwin Brandstetter
  • 175,982
  • 27
  • 439
  • 600