19

I have Log and LogItem tables; I'm writing a query to grab some data from both. There are thousands of Logs and each Log can have up to 125 LogItems

The query in question is complicated so I'm skipping it (if someone thinks it's important I can probably post it), but when I ran SSMS Estimated Query plan, it told me a new Non-Clustered index would improve performance up to 100%.

Existing Index: Non-clustered
Key Colums (LogItem): ParentLogID, DateModified, Name, DatabaseModified

Query Plan Recommendation
CREATE NONCLUSTERED INDEX [LogReportIndex]
ON [dbo].[LogItem] ([ParentLogID],[DatabaseModified])

Just for fun, I created this new index and ran the query and much to my surprise, it now takes ~1 second for my query to run, when before it was 10+ seconds.

I assumed that my existing index would cover this new query, so my question is why did creating a new index on the only columns used in my new query improve performance? Should I have an index for each unique combination of columns used in my where clauses?

note: I don't think this is because the SQL Server is caching my results, I ran the query about 25-30 times before I created the index and it consistantly took 10-15 seconds, after the index it is now consistantly ~1 or less.

Nate
  • 1,687
  • 4
  • 22
  • 33
  • Before you created the additional nonclustered index, what did the actual execution plan show for index usage? – Thomas Stringer Dec 19 '11 at 18:14
  • What is improved performance by 100%? –  Dec 19 '11 at 18:16
  • @Shark Good question, I'm not sure. This is my first performance debugging situation. I'll be sure to grab that going forward. All it said was 'Missing index' and it said which fields. –  Dec 19 '11 at 18:19
  • @JeffO This is what SSMS said: "The Query Processor estimates that implementing the following index could improve the query cost by 100%." –  Dec 19 '11 at 18:19

2 Answers2

21

Order of columns in an index is important. If filtering requires column 1 and 4 from index, the index is not going to help. It's only useful when filtering by the first N consecutive columns.

This is because index is a tree. You can't efficiently select all nodes of the tree where column3 = something, because they are scattered all other the place, belonging to different values of column1 and column2. But if you know column1 and column2 as well, locating the right branch in the tree is a no brainer.

GSerg
  • 1,342
  • 1
  • 15
  • 29
  • Would it then be safe to assume (in general) I need one index per set of "where" clauses that are going to hit that table? –  Dec 19 '11 at 18:16
  • I once did a massive speedup of someone else's query just by making sure it used the index in the proper order. –  Dec 19 '11 at 18:17
  • 1
    @Nate Broadly, yes. Some wheres may overlap, so you might have an index that nicely covers several wheres; or you can ignore some part of a where clause because indexing on a certain column isn't going to help anyway (low selectivity); but broadly, yes. –  Dec 19 '11 at 18:19
  • @Nate You don't want to have any more indexes than needed. Each index that SQL has to maintain adds overhead of its own. If you can reorder your WHERE clauses to match the first N columns on an existing index, that should get you very close without adding additional indexes. – That Chuck Guy Dec 19 '11 at 18:23
  • 1
    @ChuckBlumreich Order of columns in where clauses is not important. The server will always arrange them to make best use of existing indices. It's only a question of having an index that includes all required where columns as its first columns. –  Dec 19 '11 at 18:27
  • @GSerg You're correct of course for SSMS, which Nate did specify. Complicated queries can sometimes also throw it into the weeds. – That Chuck Guy Dec 19 '11 at 18:34
  • Do you mean SSMS will re-order the where clause? What about LinqToSQL or Entity Framework? –  Dec 19 '11 at 18:38
  • @Nate Linq2Sql or EF are wrappers that ultimately generate plain SQL code. Same rules apply to them therefore. Not sure why SSMS was mentioned though, the plain vanilla SQL Server will always do its best regardless of how columns are listed in a where clause. Rule of thumb: order in a where: don't care (and order in an index is important). –  Dec 19 '11 at 18:42
  • @GSerg Thanks. That is what I thought. Just making sure. Thanks for your help. –  Dec 19 '11 at 18:44
12

The leading edge of an index is what matters.

As long as your query is "covered" by a leading edge of an index, it will be efficient. Database indexes are typically implemented as B-Trees and the structure of the B-Tree dictates that the search must be done in a certain order, which is why the order of fields in the composite index matters.

If you have "holes", e.g. if you search on ParentLogID and DatabaseModified, but only have index on {ParentLogID, DateModified, Name, DatabaseModified}, then only the {ParentLogID} portion of the index can be utilized efficiently.

(NOTE: Some DBMSes can utilize the {DatabaseModified} portion through "skip scan", but even if your DBMS does that it is much less efficient than the regular index access).

  • So if I have Columns (a, b, c, d, e, f) and most queries are ... WHERE A IN(...) AND B = 3 my index Index(a,b,c,d) that is a good one, but it does not help if I have ... WHERE A IN (...) AND D = 5 which is why my the new index I made, Index(a,d) improved performance so much, right? –  Dec 19 '11 at 18:42
  • 8
    @Nate - correct. Think of it like a phone book. If you know just someone's first name, it's impossible to find without looking through the whole book since it's organized on Lastname, Firstname – JNK Dec 19 '11 at 18:50