3

There is a plethora of information available detailing why an IDENTITY field should be use as both the primary key and clustered index of a table for most situations; yet I'm having difficulty deciding if my particular situation is an exception.

I have a table with around 250 columns—I don't want to start a normalization debate—and around 100M rows. The table is page compressed. On peak days, around 1M rows are entered into the table sequentially with 0 contention from other connections. The combination of a char(2) NOT NULL field and an int NOT NULL field is guaranteed to be unique and never changing. There are only about 10 unique values for the char(2) field.

My argument for an IDENTITY column being used as the clustered index is solely performance based. Since inserts will happen sequentially, I shouldn't haven't to worry about latch contention; thus the inserts should be faster since there will be no need to search the index. On the other hand, a composite key of 6 bytes is rather small; and I'll end up making it the primary key anyway (with the char(2) field first). Furthermore, there will be a few other tables that are bulk loaded based on the new rows in this table (likely via a filtered index). If I use an IDENTITY column, I'll likely use that column as the clustered index and primary key of these other tables (without the IDENTITY property) and not bring over the "natural" composite key.

Does making a nonclustered index on the composite key nullify the expected speed boost of using an IDENTITY column as the clustered index since this index will have to be searched when the inserts are done?

Edit

Based on the comments, I changed the title and am asking an amended question whose answer can be used to answer the original question above. How does maintenance of a unique clustered index compare to maintenance of a unique nonclustered index in regards to inserts? Does a non-sequential clustered index suffer similarly to a non-sequential nonclustered index?

philomathic_life
  • 451
  • 2
  • 13
  • 1
    Does having a nonclustered index on any table nullify the benefits of a properly chosen cluster key? – LowlyDBA - John M Jun 15 '18 at 18:45
  • @LowlyDBA, typically I'd say no. If I thought the answer to your question was always "no", then I wouldn't have asked my question. The only benefit of using an IDENTITY column was explained in my post. If that sole benefit is negated with the chosen nonclustered index, then I won't use the IDENTITY column. The "properly chosen cluster key" would then be the char(2) and int fields. – philomathic_life Jun 15 '18 at 18:52
  • I understand your post, but I want to understand why you think your situation isn't "typical" in that it might cause the non-clustered index to override any improvement. I don't see anything in your post that strikes me as an atypical scenario, but if there is something else it might make the question more useful overall. Any nonclustered index has a maintenance cost for any DML being performed on a table's rows, but that doesn't usually justify not having one. – LowlyDBA - John M Jun 15 '18 at 18:58
  • The "typical" situation is to make the IDENTITY column both the primary key and clustered index. In my situation I will make the composite key the primary key. I was unable to find resources that showed that making the "natural" key the primary key maintains (or nullifies) the performance gains of having a sequential clustered key. – philomathic_life Jun 15 '18 at 19:02
  • @LowlyDBA, are the maintenance costs equivalent to the cost of making the composite key the clustered index? If so, then I'll not use an IDENTITY column since the composite key is already unique and unchanging; and I'll avoid using an extra uninformative 4-byte column. – philomathic_life Jun 15 '18 at 19:08
  • I apologize I'm not being clear enough here it seems, I was trying to walk through why you framed the question the way you did. I mean to say that while primary key + clustered is often the default choice, it isn't uncommon to deviate from it when circumstances call for it. This question might answer your primary concern (basically, it depends). FWIW I'd probably leave your setup as-is unless you were having issues already. – LowlyDBA - John M Jun 15 '18 at 19:12
  • Thanks for the link. Unfortunately, Remus's answer suggests an IDENTITY column be used since it's always increasing but also suggests the composite key since the char(2) field can be leveraged in more queries. He mentions that a sequential key reduces page splits and fragmentation, but wont those same page splits and fragmentation happen on the nonclustered index? – philomathic_life Jun 15 '18 at 19:21
  • Stop worrying so much about fragmentation. https://groupby.org/conference-session-abstracts/why-defragmenting-your-indexes-isnt-helping/ – Erik Darling Jun 15 '18 at 19:46
  • @sp_BlitzErik, the reason I "worry so much" is other brilliant individuals (e.g., Remus Rusanu and Kimberly Tripp) suggest and explain the benefits of sequential keys (at least in regards to a clustered index). I'll listen to the video you posted while I work though. Like most things in life, it's challenging to navigate equally valid but conflicting information. – philomathic_life Jun 15 '18 at 19:52
  • 3
    "it's challenging to navigate equally valid but conflicting information" That's because this depends on the workload and what you care about. So you should test, rather than speculate. BTW This sounds like a good candidate for a Clustered Columnstore Index + a Nonclustered Primary Key. – David Browne - Microsoft Jun 15 '18 at 21:28
  • 1
    I understand that it's difficult to navigate. People generally give good general information about what you should consider. What it comes down to is that you should design indexes to fit your workload, not to avoid fragmentation. Fragmentation, if it becomes a problem, can be fixed. – Erik Darling Jun 15 '18 at 22:37
  • I've never done real analysis, but years ago I had a table with a few dozen columns and 100 - 300 million rows depending on the time of the month. We used to have an identity column clustered PK on it and I suggested changing it to another integer column that wasn't unique because of locking issues. It solved our blocking issues and performance was never an issue doing a million or more inserts at a time. But it didn't have a composite PK on it. We kept the same identity column as a PK and only changed it to a nonclustered PK from a clustered. – Alen Jun 18 '18 at 20:33
  • @Alen, it sounds like you were suffering from the latch contention I alluded to in my post. In my case, inserts happen one at a time so it likely would not have led to the problem you faced. – philomathic_life Jun 19 '18 at 16:00

1 Answers1

1

Embarrassingly, I never knew of columnstore indexes. This table is primarily used for analytics; thus I used a clustered columnstore index with a primary key defined on the char(2) and int fields as per the suggestion from David Browne - Microsoft in the comments.

Since the question was about clustered index maintenance compared to nonclustered index maintenance, I'll "answer" that part to the best of my ability. From what I've read, it takes more resources when a clustered index has a page split than a page split on a smaller nonclustered index (with the same key). This is because the page contains far more data in the clustered index. Consequently, had I not used a columnstore clustered index, I would have used (and tested) an IDENTITY column for the clustered index. Having said that, the video sp_BlitzErik provided in the comments is very informative—and easily digestible for plebs like me—and highlights how there is a lot of outdated recommendations "out there" based on older implementations of SQL Server; thus concerns about external fragmentation can be exaggerated.

philomathic_life
  • 451
  • 2
  • 13