On a table with several million rows, with an index on the only column in the where clause (varchar 100, full length indexed), would there be performance gains in making sure that the differences between those rows were early in them? i.e. would it be better if those columns contained: variable stuff-constant stuff rather than constant stuff-variable stuff?
-
Depending on your use, you may find calculating an integer hash of the varchar and indexing that works even better. – Michael Green May 01 '15 at 06:54
1 Answers
It sounds like the column is a composite of two or more items. Creating an index on two or more columns can increase the chance that a value in the key is different from any other value in the index key. This can have its benefits. But as you add more columns than will be used in the where clause, the performance of the index degrades because it becomes excessively complex to evaluate.
The varchar data in the index is ultimately stored in a binary format. Therefore the order of the parts of the varchar strings (variable then a constant contrasted with constant then a variable) in the column has no affect on performance. Operations invoking an index use a binary tree for filtering and matching.
One way to optimize the query would be to add where clauses to the query. But sometimes there is no way to do this for business/functional reasons.
Another way would be to find a key (e.g., a composite key) with combinations of values that will likely be different from other combinations of values on a row-by-row basis. The next step would be to create an index based on this composite key. But sometimes there is no way to do this for business/functional reasons.
- 101
- 1
- 1