3

I'm trying to figure out the fastest way to:

  • INSERT a row if it doesn't exist
  • REPLACE (all columns at once) the row if it exists

I thought about DELETE + INSERT, but that looks like 2 scans and if I do an INSERT + ON CONFLICT, I get a huge query where everything has to be duplicated.

Is there a generally acknowledged method to do this?

Erwin Brandstetter
  • 175,982
  • 27
  • 439
  • 600
Thomas
  • 303
  • 2
  • 9
  • "that looks like 2 scans" - If your table is indexed, it ideally would be 2 seeks which is rather fast. I wouldn't go about complicating it otherwise. – J.D. Aug 06 '22 at 19:30

1 Answers1

2

That's a generally acknowledged case for UPSERT (INSERT ... ON CONFLICT .. DO UPDATE ...).

And the simple case only involves a single index scan. (Aside from possible locking and writing and index updates.)

For one, race conditions under concurrent write load are typically handled smarter. See:

And it's also typically cheaper. "REPLACE (all columns at once) the row if it exists" sounds mildly suspicious. If "the row exists", at least one or more columns must be identical. On the other hand, UPDATE writes a new row version in any case (if it writes at all). That's still cheaper than DELETE + INSERT if toasted fields can be carried over, or index entries with a H.O.T. UPDATE. See:

Your "huge query where everything has to be duplicated" indicates potential for optimization or possible misunderstandings. If that might be the case, show the query you have, together with your version of Postgres, a reproducible setup, and the desired result (in a new question!).

That said, if you don't have toasted columns, and no heavy concurrent write load, and few indexes (or just one), then DELETE + INSERT is only moderately more expensive.

When replacing large parts of a table without concurrent write load, it may even be cheaper. You might VACUUM the table between DELETE and INSERT ... But that's digressing from the use case in the question.

Erwin Brandstetter
  • 175,982
  • 27
  • 439
  • 600
  • In my specific case, the table is used to store the results of some simulations. Besides the few columns that identify the process, every other column has new numbers at each iteration, which means that ON conflict causes me to duplicate the 31 other columns in the query. There is nothing bad with it, but that makes quite a large query as everything has to be there twice. – Thomas Aug 08 '22 at 13:06
  • @Thomas: UPDATE requires you to spell out target columns. But shorter syntax may be available. You might post another question with the necessary details if that's relevant. – Erwin Brandstetter Aug 08 '22 at 16:00
  • I've posted another question to see if the syntax could be simplified. – Thomas Aug 08 '22 at 21:38