2

I have a SQL Server query on a large table (>17 million rows) that sometimes does not populate all the predicates before performing an index seek, which results in not filtering out enough rows and slowing the query down significantly (hours).

Microsoft SQL Server 2005 database is used and this SQL query is generated by Hibernate.

Here is how the simplified query looks like:

SELECT MAIN_TABLE.FIELD_1,
       MAIN_TABLE.FIELD_2,
       MAIN_TABLE.FIELD_3,
       MAIN_TABLE.FIELD_4,
       TABLE_1.FIELD_1,
       TABLE_1.FIELD_2,
       TABLE_2.FIELD_1,
       TABLE_2.FIELD_2,
       TABLE_3.FIELD_1
FROM MAIN_TABLE
     INNER JOIN TABLE_1 ON MAIN_TABLE.TABLE_1_ID = TABLE_1.ID
     INNER JOIN TABLE_2 ON MAIN_TABLE.TABLE_2_ID = TABLE_2.ID
     LEFT OUTER JOIN TABLE_3 ON MAIN_TABLE.TABLE_3_ID = TABLE_3.ID
WHERE MAIN_TABLE.SOME_ID IN (SELECT SOME_ID FROM PARTITION_VIEW 
                             WHERE PARTITION_VIEW.SOME_FIELD_1 = [VALUE_1]
                             AND PARTITION_VIEW.SOME_FIELD_2 = [VALUE_2]
                             AND PARTITION_VIEW.SOME_FIELD_3 = [VALUE_3])
AND MAIN_TABLE.DATE BETWEEN [DATE_RANGE_1] AND [DATE_RANGE_2]
AND TABLE_1.STATE = [STATE]

The PARTITION_VIEW is a UNION of materialized views that all have index on columns SOME_FIELD_1, SOME_FIELD_2, and SOME_FIELD_3.

In the MAIN_TABLE, we have an index on columns DATE and SOME_ID and this is the index that gets utilized which is correct.

When I checked the query plan from the SQL Server Management Studio, I found that in some cases such as when the date range is large (one year), the predicate used is the DATE column only. Then only after all the rows are within the given date range are found, the result set is filtered out by the given SOME_IDs.

It seems like the query optimizer did not pick the best query plan available.

When the exact query is directly executed from SQL Server Management Studio, everything performs correctly. This suboptimal query plan is used only when the query is executed from the application through Hibernate.

One thing I tried was to create actual table PARTITION_TABLE which copied over all the data from the PARTITION_VIEW and retrieved SOME_IDs from the PARTITION_TABLE, then the index on MAIN_TABLE was using all the predicates (SOME_ID, DATE). I want to keep using the view if possible since it automatically reflects the changes.

Is it possible to keep using the union of materialized views and ensure that the index on the MAIN_TABLE will always retrieve the SOME_ID on the where clause first so that it can be used as part of the predicate in the index?

Hannah Vernon
  • 70,041
  • 22
  • 171
  • 315
  • hello, how many rows you get form SELECT SOME_ID FROM PARTITION_VIEW WHERE PARTITION_VIEW.SOME_FIELD_1 = [VALUE_1] AND PARTITION_VIEW.SOME_FIELD_2 = [VALUE_2] AND PARTITION_VIEW.SOME_FIELD_3 = [VALUE_3] ? – Florin Ghita Dec 07 '11 at 09:03
  • PARTITION_VIEW estimates about 5000 rows where actual rows returned is couple hundreds at most, usually under 50. The PARTITION_TABLE estimates about 400 rows. Thanks! –  Dec 08 '11 at 22:37
  • 1
    ok, I think you should try an index on tuple (some_id, date) in this order. – Florin Ghita Dec 09 '11 at 06:54
  • Could you please explain why the index on tuple (date, some_id) won't be used? Is it because the date predicate is a range, not a equality predicate? –  Dec 15 '11 at 22:35
  • hmmm... it is almost the same thing. Do you have an index on this tuple despite the order? – Florin Ghita Dec 16 '11 at 08:49
  • So I have two indexes, with tuple (date, some_id) and (some_id, date, another_id). Right now, (some_id, date, another_id) gets used. There is an order by clause which I did not include, which orders by date in ascending. The indexes that gets used is (some_id, date, another_id) which will need to sort after all data has been retrieved which becomes a costly operation when large date range is used. –  Dec 21 '11 at 17:59

0 Answers0