9

While writing a query other day a thought came to me and have stuck in my mind.

What is preferable, first checking if a value for a unique column exists and then inserting or insert and let db raise unique constraint error? Will it even matter?

Edit: As suggested below in answer that this issue depends on database I am adding tag postgresql.

codecool
  • 1,983
  • 2
  • 16
  • 21

2 Answers2

8

Let the DB raise an error.

Testing first isn't safe for concurrency because you'll get a collision eventually because 2 threads may pass the "NOT EXIST" and both will try to write. This applies to both "READ COMMITTED" and MVCC/Snapshot lock strategies.

You can use lock hints to force isolation, but you reduce performance.

I call this the JFDI pattern (SO link). For "update if exists" then see this here: Need Help Troubleshooting Sql Server 2005 Deadlock Scenario. These are SQL Server. MySQL has INSERT IGNORE which handles this gracefully. Not sure about the rest

gbn
  • 69,809
  • 8
  • 163
  • 243
  • okk. I did not think about that! In case of concurrency even checking can fail. :) – codecool Feb 09 '12 at 15:58
  • 2
    Lesson 4 here: http://sqlblog.com/blogs/paul_nielsen/archive/2007/12/12/10-lessons-from-35k-tps.aspx – gbn Feb 09 '12 at 15:59
  • I think it is useful to both perform the check (and fail gracefully) but follow the "trust but verify" adage to catch the failure. The reason? Error handling is expensive. I'm working on some tests today, actually, that should reveal whether this comment makes sense. If this comment disappears, you know the result. :-) – Aaron Bertrand Feb 09 '12 at 16:11
  • Also in 2008 there is MERGE which may have resolved Paul's observations but I haven't tested this at high tx rates. Mostly because I have yet to train my brain to understand that syntax. – Aaron Bertrand Feb 09 '12 at 16:13
  • @AaronBertrand: ditto haven't used it enough. Also, for straight INSERTs, it may be slower. – gbn Feb 09 '12 at 16:18
  • @AaronBertrand It will be good if u can share info about those tests whatever the results instead of removing the comment. :) – codecool Feb 09 '12 at 16:25
  • @gbn sql server also has ignore_dup_key. It can be slow if you have many duplicates: http://sqlblog.com/blogs/alexander_kuznetsov/archive/2008/07/30/performance-of-inserts-and-ignore-dup-key.aspx – A-K Feb 09 '12 at 16:57
  • 1
    Mea culpa. I've made some observations in my testing today. Keep in mind these are very isolated tests with no concurrency. For straight inserts, checking first and avoiding the error is about 10x faster than letting the error happen. But in other cases where you are going to do more complex error handling (TRY/CATCH or IF @@ERROR <> 0, with ROLLBACK or THROW, etc.) it's about twice as slow if you check first. Again I'll stress these are just my initial observations and there are many factors that can influence the impact it can have. It may be some time before I can blog about it thoroughly. – Aaron Bertrand Feb 11 '12 at 04:26
  • Please note: my comments relate only to SQL Server 2012 RC0. I think I posted my original comments long before the question was re-tagged for postgresql. I will post back here when I get around to blogging about this, I'm trying some interesting things. – Aaron Bertrand Feb 11 '12 at 05:27
  • Further testing on a larger scale has yielded that checking first is worth it. In cases where you always succeed and go ahead and perform the insert anyway, the overhead is 175 microseconds per insert, whereas the durations went through the roof if I bypassed the check and just let SQL raise an error, or explicitly roll back, or just catch it and return gracefully. The worst was an explicit begin tran and rollback; the best was catch. (In this test I tried to insert 40,000 rows against a PK NVARCHAR(255) all succeed, all fail, or half fail). I'll let you know when I blog more details. – Aaron Bertrand Feb 14 '12 at 20:21
3

I do not think your question is really database agnostic. The right answer could depend on implementation details, which may vary from vendor to vendor and change with the next version. I would test under concurrency before choosing any approach on any RDBMS.

Right now, on SQL Server 2008 R2, I am using the following:

  1. Low concurrency and low amount of modifications. To save a single row I serialize using sp_getapplock and use MERGE. I stress test under high concurrency to verify that it works.

  2. Higher concurrency and/or volume. To avoid concurrency and boost performance, I do not save one row at a time. I accumulate changes on my app server, and use TVPs to save batches. Still, to avoid concurrency related issues, I serialize using sp_getapplock before the MERGE. Again, I stress test under high concurrency to verify that it works.

As a result, we have good performance and zero concurrency related issues in production: no deadlocks, no PK/unique constraints violations etc.

A-K
  • 7,244
  • 3
  • 32
  • 50
  • 1
    Yes, usually avoiding errors performs better than throwing and catching exceptions, but you need to benchmark it on your platform. – A-K Feb 09 '12 at 16:56