3

Introduction

I have a PostgreSQL table setup as a queue/event-source.

I would very much like to keep the "order" of the events (even after the queue item has been processed) as a source for e2e testing.

I starting to run into query performance slow-downs (probably because of table bloat) and I don't know how to effectively query a table on a changing key.

Initial Setup

Postgres: v15

Table DDL

CREATE TABLE eventsource.events (
    id serial4 NOT NULL,
    message jsonb NOT NULL,
    status varchar(50) NOT NULL,
    createdOn timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
    CONSTRAINT events_pkey PRIMARY KEY (id)
);
CREATE INDEX ON eventsource.events (createdOn)

Scrape Query (Pseudo Code)

BEGIN;  -- Start transaction

SELECT message, status FROM eventsource.events ee WHERE status = 'PENDING' ORDER BY ee.createdOn ASC FOR UPDATE SKIP LOCKED LIMIT 10; -- Get the OLDEST 10 events that are pending -- I found that having a batch of work items was more performant than taking 1 at a time.

... -- The application then uses the entries as tickets for doing work as in "I am working on these 10 items, no one else touch" ... UPDATE ONLY eventsource.events SET status = 'DONE' WHERE id = $id_1 UPDATE ONLY eventsource.events SET status = 'DONE' WHERE id = $id_2 UPDATE ONLY eventsource.events SET status = 'FAIL' WHERE id = $id_3 UPDATE ONLY eventsource.events SET status = 'DONE' WHERE id = $id_n ... END; -- finish transaction

Rough Worker outline

Multiple workers taking batches of work items form the queue then actioning them and reporting their statuses. I want to have as little overlap as possible.

rough work to queue sketch

Assessment

When looking at the execution plan it looks like the query has to traverse the entire table to get the records that are in 'PENDING' status.

I thought this might be because of the ORDER BY ee.createdOn ASC at first. But after reviewing the execution plan I saw that the query was traversing the entire table searching for the status, and only THEN ordering it.

Attempt

I saw partial indexes and hoped it could reduce the search space of the queries.

CREATE INDEX ON eventsource.events (status)
WHERE status = 'PENDING'

But I think I made it worse ...

Records are being inserted with the 'PENDING' status and then almost immediately changed to 'DONE' (or 'FAIL') as the application is consuming the queue. I think this might be destroying the index every time and then recreating it from scratch after the update to the status field (probably very expensive).

Question

What is the effect of updating a partial-index's key / predicate (and if significant) how do I effectively filter a big table on a changing key?

Index Approach

Is my index approach sound?

My first thought was Indexes but maybe partitions would be better suited here?
What happens if the partition key gets changed?
Is it just as destructive as destroying the index?

Index type

I know the default index type is a B-Tree, would a HASH index (or other) be better in this situation?

Under the hood, would changing the index key of a HASH index, result in destroying/recreating the index table the same way it does with a B-Tree?

Index creation

I am unsure what the effect is of the partial index's key vs predicate. What is the effective difference in indexing between:

CREATE INDEX ON eventsource.events (status)
WHERE status = 'PENDING'

and

CREATE INDEX ON eventsource.events (createdOn)
WHERE status = 'PENDING'

Here I am using createdOn because it is in my scrape query but I think id would work too.

Would moving the index key to a different field effect the index creation/recreation? In this instance I moved it from the status field (which will change) to the createdOn field, which won't. I don't quite understand what this SO implies.
And the Postgres docs are a little unclear to me about this type of partial index.

WesAtWork
  • 33
  • 3
  • You use createdOn and created interchangingly. Please fix. Always disclose your version of Postgres. (SELECT version(); helps.) Your query displays SELECT * Do you actually need the whole row back, or just a certain selection? LIMIT 10 is an arbitrary limit, I assume? You just want to process oldest entries first, right? ("FIFO"). Is there only ever a single writing session or can there be more? Give a rough estimate for the total number of rows, new rows per day and row size. – Erwin Brandstetter May 09 '23 at 22:53
  • Please show us the plans, not just your the interpretations of the plans. – jjanes May 10 '23 at 02:51
  • "Would moving the main key to a different (static) field..." What is the "main key"? If that is the primary key, then isn't it already static? – jjanes May 10 '23 at 03:37

3 Answers3

5

Don't use timestamp (without time zone)

Your whole setup is prone to failure:

CREATE TABLE eventsource.events (
    ...
    createdOn timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP -- !
    ..

CURRENT_TIMESTAMP (a.k.a. now()) returns timestamptz, not timestamp.

If by chance, accident or malicious intent any session ever sets a different timezone and then inserts a row relying on the column default, you get a different (wrong) local time, breaking sort order. And you'd have a hard time finding out why. Don't do this. Especially not with such a column default. (LOCALTIMESTAMP runs into the same problem: also depends on the current timezone setting.)

Related:

Better table definition

CREATE TABLE eventsource.event (
  event_id    integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY
, message     jsonb NOT NULL
, status      text NOT NULL CHECK (status = ANY ('{PENDING,DONE,FAIL}'::text[]))  -- more?
, created_on  timestamptz NOT NULL DEFAULT CURRENT_TIMESTAMP  -- !!!
);

Use legal, lower-case identifiers if at all possible. See:

Use text and add a CHECK constraint to enforce legal states.

IDENTITY is preferable over serial in modern Postgres. See:

Most importantly, use timestamptz as explained at the top. All other points are mere recommendations.

Better index

Use a partial index, as already suggested by Charlieface:

CREATE INDEX ON eventsource.event (created)
WHERE status = 'PENDING';

It's radically smaller for your use case and provides sorted rows. The small index is also cheaper to maintain. However, there will be a lot of churn, so the index will bloat quickly. See:

Consider aggressive autovacuum settings for the table. Like:

ALTER TABLE eventsource.event SET (autovacuum_vacuum_scale_factor = 0.03);

The global default for autovacuum_vacuum_scale_factor is 0.2. Meaning, autovacuum is triggered after 20 % of the table rows + autovacuum_vacuum_threshold (50 by default) have been changed. If the table is big, that may be too lazy for your purpose. Find your balance between increased maintenance cost and improved query performance.

You may or may not need a full index on (created_on) additionally for other purposes.

Better approach

Assuming:

  • Current Postgres 15.
  • There can be concurrent writes (and/or concurrent locks).
  • You want to process the oldest row that has not been processed, yet. (And is not being processed concurrently by another session.)
  • The application process succeeds in the majority of cases.
BEGIN;  -- !!!

UPDATE eventsource.event SET status = 'DONE' WHERE event_id = ( SELECT event_id FROM eventsource.event WHERE status = 'PENDING' ORDER BY created_on LIMIT 1 FOR UPDATE SKIP LOCKED -- !!! ) RETURNING *; -- or just what you need!

-- The application then processes the entries returned by the query and will then update them

-- ONLY in case of a failure !!! -- Else just skip this: UPDATE eventsource.event SET status = 'FAIL' WHERE event_id = $id_3; -- your failed ID

COMMIT;

This approach works reliably under concurrent write load, and never blocks. It only locks a single row per session, minimizing chances for complications. It locks and updates the row immediately, which is faster than locking and updating later. In the rare case of a failure, you need a second update. But that's cheap in comparison.

If you need to (or just want to) lock and process multiple rows, that works in similar fashion. See:

Erwin Brandstetter
  • 175,982
  • 27
  • 439
  • 600
  • You can also use Enumerated Type for status which will be hard coded. – Sahap Asci May 10 '23 at 09:22
  • Timestampz is the correct and safe way to do this. -- I tested the index section and it is faster as you say, my concern is with index overhead with recreation/destruction when the status updates, will vacuum'ing more often help?

    My spesific application relies on the result of the "work" units to be concrete, Curently it can't deal with a unit from this specific point in the process retroactively changing (working on it). I've adapted it to keep the lock but thanks for the optimization! When I figure it out I'll use this instead

    – WesAtWork May 10 '23 at 09:31
  • Also thanks for a comprehensive answer. Very Informative – WesAtWork May 10 '23 at 09:31
  • 1
    @WesAtWork: I added a bit to address index maintenance. About locking, just to be clear: updated rows stay locked until the end of the transaction. Effects are not visible to other transactions before that (with default READ COMMITTED isolation level). – Erwin Brandstetter May 11 '23 at 09:18
1
  • Your attempted index
    CREATE INDEX ON eventsource.events (status)
    WHERE status = 'PENDING'
    
    does not have createdon in it, so it can't help the query very much. Even if the index is used, the server would still need to sort all the PENDING rows in order of createdon.

  • My first thought was Indexes but maybe partitions would be better suited here?

    Partitioning is irrelevant here. The right index is what you want.


  • I know the default index type is a B-Tree, would a HASH index (or other) be better in this situation? No, you need the results sorted by createdon to get this query to perform. Hash indexes do not give results in order.


  • The final attempt
    CREATE INDEX ON eventsource.events (createdOn)
    WHERE status = 'PENDING'
    
    is a good start. But if you want it to cover the query, you need to add INCLUDE columns. Given the width of the table, it may be worth just forgoing that and relying on a bitmap scan. You may want to limit the columns in the SELECT to avoid this.

Having said all that, it's unclear what "processing" your application does, and whether the whole thing could be done in a single UPDATE statement anyway.

Charlieface
  • 12,780
  • 13
  • 35
  • thanks for the quick reply

    Application uses an external reference to "process" the work, this is the work queue.

    I don't think what you sayed about the creadedOn is correct.

    When I do the excution plan I can see the query explicitly using the eventsource_events_status_idx when it does a index scan.

    To be sure I deleted the createdOn index as well and it still used the eventsource_events_status_idx index.

    – WesAtWork May 09 '23 at 21:34
  • Using the createdOn version of the partial index is faster though because the execution plan can skip the sort step apparently! – WesAtWork May 09 '23 at 21:39
  • Yes it uses it, but it doesn't help it that much as it still needs to sort. Adding createdOn means a sort is not needed. – Charlieface May 09 '23 at 22:20
0
  • Add

    CREATE INDEX ON eventsource.events (status, createdOn)
    
  • If workers can dispatch the tasks faster than the the queueing mechanism can handle the query, abandon the Queue. "Don't queue it, just do it."

  • When you assign tasks in a batch, or if the batch is 1 and the task takes a long time:

  1. Have a transaction (with locking) to grab the task(s) and assign them to the worker. (This means adding a "worker_id" column.) Do not perform the task while in this transaction.
  2. When finishing one task, release the task from the worker. (Eg, set worker_id = NULL.)
  3. Have a job periodically checking for orphaned tasks. This can happen if a worker dies without releasing the task.
Rick James
  • 78,038
  • 5
  • 47
  • 113