1

I need to update the "oldest" row in the database (for now, its limit 1, but I need to also be able to set it to n).

I'm essentially doing a constant stream of "update the oldest rows and retrieve them at the same time".

The column is indexed, it's called last_sent_at and I'm calling it ~20K times a minute.

The thing is, last_sent_at can be NULL and that should always be favored over ones with a filled in time. Once all of them are filled in it should choose the "oldest".

This works for me, the problem is I don't know if its efficient, and it also doesn't work when I need to update 5 rows at once. I guess I could use IN I'm just afraid it would be inefficient.

UPDATE subscribers
SET last_sent_at = '2018-11-17 00:02:27'
WHERE id = (
  SELECT id
  FROM subscribers
  ORDER BY last_sent_at NULLS FIRST
  LIMIT 1
)
RETURNING id;
Tallboy
  • 223
  • 1
  • 2
  • 11

1 Answers1

2

If the table is big, the key to performance is a matching index on (last_sent_at NULLS FIRST).

Since you are updating so much, have aggressive VACUUM settings. This can be adjusted per table. See:

For only a single row, UPDATE with a subquery like you have is fine.

For LIMIT N with N > 1 a CTE is your safe bet. Like:

WITH cte AS (
   SELECT id
   FROM   subscribers
   ORDER  BY last_sent_at NULLS FIRST
   LIMIT  5
   )
UPDATE subscribers s
SET    last_sent_at = '2018-11-17 00:02:27'
FROM   cte
WHERE  s.id = cte.id
RETURNING id;

If you do this under concurrent write load, add another expression to your ORDER BY to make the order deterministic. And add the same expression to your index to match! (There can be many rows with NULL, right?) And use the same order for all commands taking row locks in the same table. Else you might face deadlocks from multiple transaction taking locks in arbitrary order. Better yet, use FOR UPDATE SKIP LOCKED if you are not absolutely bound to the strict order. Closely related, with in-depth explanation:

Related:

Erwin Brandstetter
  • 175,982
  • 27
  • 439
  • 600
  • There will definitely be very heavy write load but it will be all INSERT statements of new records, not modifying any existing records. Sorry I'm not A super DB expert, but are deadlocks possible which "stop" the entire database? or is it a temporary very quick issue. Thank you so much, I will read those – Tallboy Nov 18 '18 at 03:16
  • I guess this is really stupid but I didn't think a deadlock was possible if I'm just writing "normal" queries. I thought postgres handled all of that in a sort of synchronous way – Tallboy Nov 18 '18 at 03:17
  • 1
    Deadlocks in your case are resolved automatically by killing one or more of the contesting transactions. Details: https://dba.stackexchange.com/a/12984/3684 And, well, "a sort of synchronous way" is achieved by taking locks in a consistent order - the point above exactly. – Erwin Brandstetter Nov 18 '18 at 03:20
  • Ok thank you, I will need to really read into this. i wasn't even aware this was a thing. By this: "if you are not absolutely bound to the strict order" does this mean it still selects the 5 oldest records, but they will be in a different order WITHIN the 5 results (that's ok because they're all getting the same time anyway). – Tallboy Nov 18 '18 at 03:22
  • 2
    @Tallboy: By that I honor the possibility that SKIP LOCKED can skip over the "oldest" rows because it's currently locked - for whatever reason, even a concurrent transaction trying to do the same update (which might still fail, we don't know until it's committed). – Erwin Brandstetter Nov 18 '18 at 03:25
  • Ok I see... I think that's ok because it will just be caught in the next pass. Now I need to go read about all of this. Do you know of a 'beginner' place to read what causes a deadlock? I will still read all those links you sent – Tallboy Nov 18 '18 at 03:26
  • 1
    This should be instrumental understand deadlocks: https://stackoverflow.com/a/22776994/939860 – Erwin Brandstetter Nov 18 '18 at 03:27
  • thanks so much. Down the rabbit hole i descend – Tallboy Nov 18 '18 at 03:28