1

I've got an events table with over 40 million rows. It has a btree index on type: "index_events_on_type" btree (type).

Any equality look up takes ages! I tried tuning up the work_mem to get rid of the lossy value but it had little to no effect.

dev=# EXPLAIN (ANALYZE, BUFFERS) SELECT "events".* FROM "events" WHERE "events"."type" = 'contact.card.added';
                                                                   QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on events  (cost=56176.20..1633783.13 rows=1614405 width=296) (actual time=252.846..13832.331 rows=1550851 loops=1)
   Recheck Cond: ((type)::text = 'contact.card.added'::text)
   Rows Removed by Index Recheck: 10414231
   Heap Blocks: exact=39335 lossy=758338
   Buffers: shared read=807020
   ->  Bitmap Index Scan on index_events_on_type  (cost=0.00..55772.60 rows=1614405 width=0) (actual time=243.096..243.096 rows=1550851 loops=1)
         Index Cond: ((type)::text = 'contact.card.added'::text)
         Buffers: shared read=9347
 Planning time: 0.100 ms
 Execution time: 13924.612 ms
(10 rows)

...and the table

dev=# \d events
                                            Table "public.events"
      Column       |            Type             | Collation | Nullable |              Default
-------------------+-----------------------------+-----------+----------+------------------------------------
 id                | bigint                      |           | not null | nextval('events_id_seq'::regclass)
 type              | character varying           |           | not null |
 aggregate_root_id | uuid                        |           | not null |
 entity_id         | uuid                        |           |          |
 parent_id         | bigint                      |           |          |
 created_at        | timestamp without time zone |           | not null |
 body              | jsonb                       |           |          | '{}'::jsonb
Indexes:
    "events_pkey" PRIMARY KEY, btree (id)
    "index_events_on_aggregate_root_id" btree (aggregate_root_id)
    "index_events_on_created_at" btree (created_at)
    "index_events_on_entity_id" btree (entity_id)
    "index_events_on_parent_id" btree (parent_id)
    "index_events_on_type" btree (type)
Foreign-key constraints:
    "fk_rails_68f023eb25" FOREIGN KEY (parent_id) REFERENCES events(id)
Referenced by:
    TABLE "events" CONSTRAINT "fk_rails_68f023eb25" FOREIGN KEY (parent_id) REFERENCES events(id)

Any guidance would be appreciated.

Update

Looks like the most time spent is in I/O as the query returns 1.5M records:

dev=# EXPLAIN (ANALYZE, BUFFERS) SELECT "events"."type" FROM "events" WHERE "events"."type" = 'bank_account.payout.added';
                                                                  QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on events  (cost=6324.92..520827.82 rows=182239 width=32) (actual time=283.119..15719.744 rows=1550851 loops=1)
   Recheck Cond: ((type)::text = 'contact.card.added'::text)
   Rows Removed by Index Recheck: 10414212
   Heap Blocks: exact=39336 lossy=758337
   Buffers: shared hit=75 read=806945
   I/O Timings: read=13783.176
   ->  Bitmap Index Scan on index_events_on_type  (cost=0.00..6279.36 rows=182239 width=0) (actual time=272.302..272.302 rows=1550851 loops=1)
         Index Cond: ((type)::text = 'contact.card.added'::text)
         Buffers: shared hit=75 read=9272
         I/O Timings: read=67.302
 Planning time: 0.069 ms
 Execution time: 15807.701 ms
(12 rows)

Rows returned:

dev=# SELECT COUNT(*) FROM "events" WHERE "events"."type" = 'contact.card.added';
  count
---------
 1550851
(1 row)

Total rows:

dev=# SELECT COUNT(*) FROM "events"
dev-# ;
  count
----------
 36447779
(1 row)
EasyCo
  • 111
  • 4
  • I don't understand how you can get lossy blocks with btree index on a text or varchar column ? What is the exact data type of column events.type ? Can you post DDL of table and index ? – pifor Apr 16 '20 at 06:39
  • Updated @pifor - it's not text, it's varchar but there's type casting for some reason. – EasyCo Apr 16 '20 at 06:42
  • To which value did you set work_mem? –  Apr 16 '20 at 07:02
  • I still cannot understand why you have lossy blocks in this configuration. Documentation only mention lossy blocks for non btree indexes. I cannot reproduce this behaviour with similar setup with PG 9.6 (I also have conversion from character varying to text). Lossy blocks seems to explain why your query is slow. – pifor Apr 16 '20 at 07:02
  • @a_horse_with_no_name - I set it to 1GB – EasyCo Apr 16 '20 at 07:04
  • @pifor - I'm on 10.10 if that makes any difference... and what's your show work_mem; value? – EasyCo Apr 16 '20 at 07:04
  • 1
    I've tested on 10.10 with work_mem set to 1GB. I have similar execution plan with Recheck Cond but without steps Rows Removed by Index Recheck and Heap Blocks: exact=xxx lossy=yyy. – pifor Apr 16 '20 at 07:13
  • 1
    @a_horse_with_no_name: this is misleading because there is an equality test in the query and the execution plan also has an equality test: Index Cond: ((type)::text = 'contact.card.added'::text): I just don't understand this because the btree can easily return key is equal or key is different. – pifor Apr 16 '20 at 07:20
  • @pifor: the Index Scan is efficient if you retrieve a small percentage of the rows. It would actually be slower to use an Index Scan for a substantial percentage (as we have here) because it would cause much more random I/O. (See here or here) –  Apr 16 '20 at 07:23
  • @EasyCo: you could test if other strategies are faster, e.g. using set random_page_cost = 1.0 or maybe turning off the bitmap index scan using set enable_bitmapscan = false –  Apr 16 '20 at 07:25
  • @EasyCo: if you always (or almost always) query for that specific condition, maybe a filtered index would help: create index on events(id) where type = 'contact.card.added'; –  Apr 16 '20 at 07:27
  • Thanks for the tips @a_horse_with_no_name. The lightbulb moment was in the links you provided. In this case there's probably close to 500K rows returned so the index scan isn't as efficient and needs the bitmap scan. – EasyCo Apr 16 '20 at 07:32
  • @a_horse_with_no_name: thanks for the links. Explanation from Craig Rinder is very detailed but there is no explanation for the lossy part. For lossy explanation I have found https://www.postgresql.org/message-id/464F3C5D.2000700@enterprisedb.com – pifor Apr 16 '20 at 07:44
  • "I tried tuning up the work_mem to get rid of the lossy value but it had little to no effect" Please show us the plan for that. If possible, turn set track_io_timing = on first. – jjanes Apr 16 '20 at 14:22
  • Is this faster on repeated execution? – jjanes Apr 16 '20 at 14:28
  • @pifor The bitmap scan always has to be prepared to recheck, in case the bitmap overflowed work_mem. Even if work_mem is large enough that it doesn't actually overflow, it can't know that for certain until after the fact, so a recheck is always listed even if it is not used. – jjanes Apr 16 '20 at 14:36
  • @jjanes - not faster on repetition but it does look like it's spending a lot of time with I/O. I've updated my questions with the details. Thanks for your help. – EasyCo Apr 17 '20 at 01:23

2 Answers2

1

Do you really need to select every column? If not, then don't. It might be generating extra TOAST table lookups, or might be inhibiting index-only scans.

You are selecting 1.5 million rows. Indexes are awesome, but they are not magical. 1.5 million is a lot, especially when scattered randomly throughout a giant (how big is it?) table.

You can almost certainly speed this up by clustering the table on the "index_events_on_type" index. But this a resource intensive operation which will probably require a maintenance window to do, and it won't stay clustered upon future inserts/updates. Alternatively, you could partition the table by "type". This has more or less the same effect as clustering (in this context!) but it stays partitioned in the face of future actions.

What version of PostgreSQL are you using? I would have expected you to get a parallel query here if it is recent version (not that it will necessarily be effective, but still I would have expected one).

Given that you are retrieving ~4% of the table, a seq scan might be more effective than an index scan. You can do set enable_bitmapscan=off and see how that does.

jjanes
  • 39,726
  • 3
  • 37
  • 48
  • Thanks @jjanes - I've updated my question when selecting only one column and make little to no difference. Looks like it's all I/O. – EasyCo Apr 17 '20 at 01:29
  • It seems weird it didn't switch to an index-only scan. Is the table well vacuumed? (You can compare relpages to relallvisible in pg_class.) If you don't have to visit the table at all, but just the index, that should drastically cut down on IO. – jjanes Apr 17 '20 at 12:46
0

10 million rows were removed by the recheck, which were "false positives" owing to the lossy bitmap.

Increasing work_mem is the way to go, and I don't believe that it has no effect. Reading lots of rows just takes some time.

If you limit the columns you select and add those columns to the index (and VACUUM), you could get a much faster index only scan.

Laurenz Albe
  • 51,298
  • 4
  • 39
  • 69
  • 1
    If the workload is IO bound, then getting rid of the lossy blocks (by converting them to exact) could easily have no meaningful effect, as each of the blocks is still visited (read from disk) once. – jjanes Apr 16 '20 at 14:25
  • @jjanes That makes sense. – Laurenz Albe Apr 16 '20 at 14:55