-1

I am trying to run the following query:

WITH upd AS (
  UPDATE fila_all SET id = id WHERE
  id = (select id from fila_all fa where fa.id not in (select msg_id from fila_consumers where group_name = 'group0') and fa.queue_name = 'myqueue' order by id asc limit 1 for update of fa skip locked)
  returning id
)
INSERT INTO fila_consumers select fa2.id, 'new-worker0', 'group0', 'Running'
FROM fila_all fa2 JOIN upd on upd.id = fa2.id
RETURNING msg_id

id is a primary key on fila_all, and (msg_id, group_name) is a primary key on fila_consumers.

fila_consumers only has four columns, id bigint, worker_name varchar, group varchar, status enum.

I would expect this query to never fail due to a conflict on fila_consumers_pk, because the rows are locked, and skipped, on the update, so two transactions cannot get the same fila_all.id for insertion in fila_consumers. But I get the following error:

duplicate key value violates unique constraint "fila_consumers_pkey"
    Detail: Key (msg_id, group_name)=[...]

I don't want to use ON CONFLICT DO NOTHING, I'd like to understand why this is happening.

All insertions on fila_consumers are based on this statement, multiple instances of this statement run concurrently, but no other statements are executed that insert on this table.

What am I missing here? Is there a way around this?

simao
  • 107
  • 1
  • 4
  • fila_consumers only has tree columns. But you are inserting 4 expressions: INSERT INTO fila_consumers select fa2.id, 'new-worker0', 'group0', 'Running'. And the quoted error msg. disagrees, too: Key (msg_id, group_name). Please clarify. – Erwin Brandstetter Jul 15 '21 at 11:02
  • Sorry, forgot about status, yes 4 columns – simao Jul 15 '21 at 11:52
  • Could someone downvoting please explain why? Yes I had some typos, but not sure why this is downvoted. – simao Jul 15 '21 at 13:38

1 Answers1

1

Assuming the INSERT targets the right columns, which we can't tell without table definition and target definition list.

The statement boils down to this:

WITH cte AS (
   SELECT fa.id, {GROUP}, {worker}, 'Running'
   FROM   fila_all fa
   WHERE  NOT EXISTS (
      SELECT FROM fila_consumers fc
      WHERE  fc.msg_id = fa.id
      AND    fc.group_name = {GROUP}
      )
   AND    fa.queue_name = {queue}
   ORDER  BY fa.id
   LIMIT  1
   FOR    UPDATE SKIP LOCKED
   )
                         -- target definition list!
INSERT INTO fila_consumers (msg_id, group_name, col3, col4)
TABLE  cte
RETURNING msg_id;

You are locking a row in table fila_all (twice over in the original). So multiple instances of the same query cannot insert duplicate rows in table fila_consumers, coming from the same source. Good.

But there is nothing to prevent other concurrent transactions from inserting / updating rows in fila_consumers in the time window between the SELECT and the INSERT.

The expensive empty UPDATE in your original makes this window much bigger than necessary. My version with only SELECT should minimize the problem. And NOT EXISTS is typically faster than NOT IN. See:

But the problem is still there.

Any particular reason you are trying to avoid INSERT ... ON CONFLICT?
That UPSERT functionality was added to fix your kind of problems.

Related:

Erwin Brandstetter
  • 175,982
  • 27
  • 439
  • 600
  • Could you elaborate a bit on "But there is nothing to prevent other concurrent transactions from inserting [...]" ? all other transactions would be using the same select/insert, or just a normal insert on fila_all. the purpose of for update and skip locked was for the statement to continue to the next available row in fila_all to be inserted into fila_consumers, and so two transactions would not conflict, as they would try to insert different rows, so I wouldn't need ON CONCLICT DO NOTHING – simao Jul 15 '21 at 08:44
  • Could you please clarify your question first? You saw my leading assumpions, right? Also put information on concurrent transactions there. – Erwin Brandstetter Jul 15 '21 at 10:00
  • I changed the question now. though I don't know what you mean by group <> group_name typo, {group} was a placeholder for a query param. – simao Jul 15 '21 at 10:48
  • @simao: (msg_id, group) is a primary key on fila_consumers. <> (msg_id, group_name) – Erwin Brandstetter Jul 15 '21 at 10:57
  • ah yes, typo, sorry. Thank you for you help. – simao Jul 15 '21 at 11:53