Is it possible to force the optimizer to eliminate irrelevant tables in this partitioned view?

Question

I'm testing different architectures for large tables and one suggestion that I've seen is to use a partitioned view, whereby a large table is broken into a series of smaller, "partitioned" tables.

1, 2, 3, 4

In testing this approach, I've discovered something that doesn't make a whole lot of sense to me. When I filter on "partitioning column" on the fact view, the optimizer only seeks on the relevant tables. Additionally, if I filter on that column on the dimension table, the optimizer eliminates unnecessary tables.

However, if I filter on some other aspect of the dimension the optimizer seeks on the PK/CI of each base table.

Here are the queries in question:

select 
    od.[Year], 
    AvgValue = avg(ObservationValue)
from dbo.v_Observation o 
join dbo.ObservationDates od
    on o.ObservationDateKey = od.DateKey
where o.ObservationDateKey >= 20000101
    and o.ObservationDateKey <= 20051231
group by od.[Year];

select 
    od.[Year], 
    AvgValue = avg(ObservationValue)
from dbo.v_Observation o 
join dbo.ObservationDates od
    on o.ObservationDateKey = od.DateKey
where od.DateKey >= 20000101
    and od.DateKey <= 20051231
group by od.[Year];

select 
    od.[Year], 
    AvgValue = avg(ObservationValue)
from dbo.v_Observation o 
join dbo.ObservationDates od
    on o.ObservationDateKey = od.DateKey
where od.[Year] >= 2000 and od.[Year] < 2006
group by od.[Year];

Here's a link to the SQL Sentry Plan Explorer session.

~~I'm working on actually partitioning the larger table to see if I get partition elimination to respond in a similar fashion.~~

I do get partition elimination for the (simple) query that filters on an aspect of the dimension.

In the meantime, here's a stats-only copy of the database:

https://gist.github.com/swasheck/9a22bf8a580995d3b2aa

The "old" cardinality estimator gets a less expensive plan, but that's because of the lower cardinality estimates on each of the (unnecessary) index seeks.

I'd like to know if there's a way to get the optimizer to use the key column when filtering by another aspect of the dimension so that it can eliminate seeks on irrelevant tables.

SQL Server Version:

Microsoft SQL Server 2014 - 12.0.2000.8 (X64) 
    Feb 20 2014 20:04:26 
    Copyright (c) Microsoft Corporation
    Developer Edition (64-bit) on Windows NT 6.3 <X64> (Build 9600: ) (Hypervisor)

Just an FYI .. the last stat stream is corrupted CREATE STATISTICS [_WA_Sys_00000008_2FCF1A8A] ON [dbo].[Observation_2010]([StationStateCode]) WITH STATS_STREAM = 0x01000000010000000000000000000000D4531EDB00000000D5080000000000009508000000000000AF030000AF000000020000000000000008D000340000000007000000E65DE0007DA5000076F9780000000000867704000000000000000000ABAAAA3C0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 — Kin Shah, Jan 06 '16 at 00:16
It looks like the script for the stats-only database is truncated. I tried clicking "view the full file" and downloading the zip, but either way I have no statistics for the ObservationDates table. I'm not getting the same plan as Paul, even with 4199, and I think this is why. — Geoff Patterson, Jan 06 '16 at 14:30
@GeoffPatterson it works for me. did you click on the link to the raw file? https://gist.githubusercontent.com/swasheck/9a22bf8a580995d3b2aa/raw/fb91968f7541b8af286ee3fde47bae318fc3d262/weather_statsonly.sql
however, as Kin noted, the last stats stream is corrupted :/ — swasheck, Jan 06 '16 at 15:17
I did click the link for the raw file. The script does work (except the problem Kin noted), but doesn't contain any logic to create statistics on ObservationDates. I ended up running UPDATE STATISTICS ObservationDates WITH ROWCOUNT = 10000 manually in order to get the plan that Paul demonstrated though. — Geoff Patterson, Jan 06 '16 at 15:22
odd. creating a new database and running that script i have stats objects (well, they're indexes) on ObservationDates so i'm not sure what's going on with that. also, i'm not able to get the plan paul generated either. i'll try the update to see. — swasheck, Jan 06 '16 at 15:28

Paul White · Accepted Answer · 2016-01-06T16:34:04.237

Enable trace flag 4199.

I also had to issue:

UPDATE STATISTICS dbo.ObservationDates 
WITH ROWCOUNT = 73049;

to get the plans shown below. Statistics for this table were missing from the upload. The 73,049 figure came from the Table Cardinality information in the Plan Explorer attachment. I used SQL Server 2014 SP1 CU4 (build 12.0.4436) with two logical processors, maximum memory set to 2048 MB, and no trace flags aside from 4199.

You should then get an execution plan that features dynamic partition elimination:

select 
    od.[Year], 
    AvgValue = avg(ObservationValue)
from dbo.v_Observation o 
join dbo.ObservationDates od
    on o.ObservationDateKey = od.DateKey
where 
    od.[Year] >= 2000 and od.[Year] < 2006
group by 
    od.[Year]
option (querytraceon 4199);

Plan fragment:

This might look worse, but the Filters are all start-up filters. An example predicate is:

Per iteration of the loop, the start-up predicate is tested, and only if it returns true is the Clustered Index Seek below it executed. Hence, dynamic partition elimination.

This is perhaps not quite as efficient as static elimination, especially if the plan is parallel.

You may need to try hints like MAXDOP 1, FAST 1 or FORCESEEK on the view to get the same plan. Optimizer costing choices with partitioned views (like partitioned tables) can be tricky.

The point is you need a plan that features start-up filters to get dynamic partition elimination with partitioned views.

Queries with embedded USE PLAN hints: (via gist.github.com):

Great info, thanks Paul! I had been wondering after I wrote my answer why there isn't a way SQL Server can do this type of elimination. Turns out there is, I just hadn't seen it before! — Geoff Patterson, Jan 06 '16 at 14:33

score 6 · Answer 2 · edited Apr 13 '17 at 12:43

My observation has always been that you must specify the value (or range of values) for the partition column explicitly in the query in order to get "table elimination" in a partitioned view. This is based on experience using partitioned views in production from SQL Server 2000 through SQL Server 2014.

SQL Server doesn't have a concept of a loop join operator in which the engine can dynamically aim the seek directly at the proper table on the inner side of the loop based on the value of the row on the outer side of the loop. However, as Paul's answer explains, there is the possibility of a plan with start-up filters in order to dynamically skip irrelevant tables on the inner side of the loop in constant time (as opposed to logarithmic by actually performing the seek).

Note that for partitioned tables, however, this type of seek (to a specific partition) is supported.

If you are fixed on using partitioned views, another option is to split your query into multiple queries, such as:

-- Gather than the min/max values for the partition column
DECLARE @minDateKey INT,
        @maxDateKey INT
SELECT @minDateKey = MIN(DateKey),
        @maxDateKey = MAX(DateKey)
FROM dbo.ObservationDates od
WHERE od.[Year] >= 2000 and od.[Year] < 2006

-- Since I have a stats-only copy of the database, simulate having run the query above
-- (You can comment this out since you have the actual data.)
SELECT @minDateKey = 20000101, @maxDateKey = 20051231

-- Adjust the query to use the min/max values of the partition column
-- rather than filtering on a different column in the dimension table
select 
    od.[Year], 
    AvgValue = avg(ObservationValue)
from dbo.v_Observation o 
join dbo.ObservationDates od
    on o.ObservationDateKey = od.DateKey
WHERE od.DateKey >= @minDateKey AND od.DateKey <= @maxDateKey
group by od.[Year]
-- Must use OPTION RECOMPILE; otherwise the plan will touch all tables because it
-- must do so in order to be valid for all values of the parameters!
OPTION (RECOMPILE)

This yields the following plan. There is now an extra query that hits the dimension table, but the the query over the (presumably much larger) fact table is optimized.

Would the same effect be achieved if you incorporated the first query into the second without recourse to variables? — Andriy M, Jan 06 '16 at 00:45
@AndriyM If I'm understanding you correctly, the answer is no, the same effect will not be achieved and the query plan will touch all of the tables in the partitioned view if you try to combine the two queries. If you were to execute the first query, then paste the values 20000101 and 20051231 instead of the variables (or do something similar via two separate queries in your application), then yes, the same effect would be achieved without using the variables. — Geoff Patterson, Jan 06 '16 at 02:10

Is it possible to force the optimizer to eliminate irrelevant tables in this partitioned view?

2 Answers2