16

Looking at "SQL Server 2008 Internals"1 by Kalen Delaney, on page 13, it states the following:

"The first step in producing such a plan is to normalize each query, which potentially breaks down a single query into multiple, fine-grained queries. After the Query Optimizer normalizes a query, it optimizes it, which means that it determines a plan for executing that query."

It's been suggested to me by another DBA that performance can be improved for certain queries by moving WHERE clause predicates into the FROM clause, such that:

SELECT *
FROM dbo.table1 t1
    INNER JOIN dbo.table3 t3 ON t1.ID = t3.ID
    LEFT OUTER JOIN dbo.table2 t2 ON t1.ID = t2.ID
WHERE t1.CreateDate >= '2015-07-31 00:00:00';

would become:

SELECT *
FROM dbo.table1 t1
    INNER JOIN dbo.table3 t3 ON t1.ID = t3.ID
        AND t1.CreateDate >= '2015-07-31 00:00:00'
    LEFT OUTER JOIN dbo.table2 t2 ON t1.ID = t2.ID;

Apparently the implication for the first example is the query optimizer will perform the JOIN first, then apply the WHERE clause, which for complicated queries with large tables will not perform as well as if we manually re-write the query into the form presented in the second example. Apparently this only applies to outer joins, i.e. if this were an inner join, the normalizer would in fact move the WHERE predicate into the FROM clause.

Can someone confirm this behavior? Is there some documentation somewhere I can look at? Can I see this in the execution plan?


1 - yes, I know this is ancient.

Hannah Vernon
  • 70,041
  • 22
  • 171
  • 315
  • 4
    I'd be surprised if you see any differences at all. For inner joins at least, whether a filter is in the where clause or the join clause should be irrelevant. – Aaron Bertrand Aug 05 '15 at 17:30
  • 8
    See PaulWhite's answer in the related question: Is a join optimized to a where clause at runtime?. Especially the conclusion: "The query optimizer almost always collapses the relational select into the join during optimization, meaning the two forms will very likely produce equivalent query plans, but there is no actual guarantee." – ypercubeᵀᴹ Aug 05 '15 at 17:30
  • 2
    For very complex queries, the optimizer has to consider several (hundred, thousand?) plans. A small difference in the the starting point of its search may matter. Imagine you (the optimizer) are thrown in a tall-tree forest-covered planet and your aim is to find a high peak, as fast as you can. You are well trained and have tools so you'll often find the highest. In a very big planet though, you'll often end in just a high peak but not the absolutely highest mountain. And starting your search from Canada or New Zealand will matter. – ypercubeᵀᴹ Aug 05 '15 at 17:37
  • The contrived example is indeed a very simple one. SQL Server does generate identical plans for them. I'm attempting to give guidance to some developers over this issue, and don't want them necessarily thinking they MUST use the second form ALL the time. – Hannah Vernon Aug 05 '15 at 17:45
  • 2
    @ypercube "All drains lead to the Ocean" – billinkc Aug 05 '15 at 17:46
  • Does anyone have any knowledge about the actual question in my question, i.e. the bit mentioned in the book about the "normalizer"? – Hannah Vernon Aug 05 '15 at 17:48
  • Perhaps OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 3604); from Paul's answer mentioned above will help. Looking at that now. – Hannah Vernon Aug 05 '15 at 17:50
  • 5
    8606 gives you the trees before and after normalisation. A similar claim was made here http://dba.stackexchange.com/a/108863/3690 – Martin Smith Aug 05 '15 at 18:24
  • 3
    @ypercube - basically, the moral of that story is if the execution plans are different, I made a mistake. – Hannah Vernon Aug 05 '15 at 19:13
  • 7
    My argument is, for INNER JOIN, that filters always belong in the WHERE clause. This is a very simple and clear logical separation from JOIN criteria. If you find a case where having the filter in the ON clause improves performance, let's explore that as an edge case. I don't think you'll find one unless you're using exotic trace flags or other settings... and even then I find it unlikely that you'll get a different plan shape (and even further unlikely that it would materially affect runtime performance). – Aaron Bertrand Aug 05 '15 at 19:40
  • 1
    An older question but has some links to good SO questions http://dba.stackexchange.com/a/3481/507 –  Aug 05 '15 at 20:26

3 Answers3

3

I guess this is one of the "It depends"-moments. In a regular execution there is no or at least nearly no difference.

Thought it will become more interesting once the queries are more complex. More complex scenarios are e.g.:

  • Nested Joins
  • FORCE ORDER (Query Hint)

I would have added LEFT and CROSS Joins as well, thought they are already part of your example.

About the "experience in SQL Server normalization" - I'd say, besides looking in the code, there will not be much you can hold your self on. It is often just a "It depends!"

Dennis Winter
  • 161
  • 1
  • 6
3

While the verdict is out on this topic depending on the DBMS that is in use, I can speak for SQL Server in that I've never seen a different execution plan based on whether or not the WHERE statement was in a JOIN or in the WHERE clause itself. There are a few posts around StackOverflow that discuss this in a bit more detail, but you can confirm it for yourself by reviewing various execution plans with different use-cases. The optimizer is smart enough to know what you're doing (or trying to do) and will result in the same execution plan regardless of where you apply your record filtering.

For a good source of truth, I would recommend checking out the book "Microsoft SQL Server 2012 Internals (http://www.amazon.com/Microsoft-Server-Internals-Developer-Reference/dp/0735658560)".

Note: OUTER JOINs can produce different result-sets when using a WHERE clause in an OUTER JOIN vs. using the WHERE clause explicitly. See the example here: https://ask.sqlservercentral.com/questions/47544/sql-data-filter-condition-in-join-vs-where-clause.html

3

Another book you can check for this is "SQL Server Execution Plans, Second Edition" on https://www.red-gate.com/community/books/sql-server-execution-plans-ed-2.

Also, try the Estimated Execution Plan before actually run the query to see what it looks like and if there are some things you can review previous to the execution.