1

The query below seems simple and straightforward, yet it produces unexpected results.


CREATE TABLE #NUMBERS
(
    N BIGINT
);

INSERT INTO #NUMBERS VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9) ;

WITH A AS (
-- CHOOSE A ROW AT RANDOM SELECT TOP 1 * FROM #NUMBERS
ORDER BY NewID()
), B AS ( SELECT A.N AS QUANTITY, 'METERS' AS UNIT FROM A

UNION ALL

SELECT A.N*100 AS QUANTITY, 'CENTIMETERS' AS UNIT FROM A

UNION ALL

SELECT A.N*1000 AS QUANTITY, 'MILLIMETERS' AS UNIT FROM A

UNION ALL

SELECT A.N*1000000 AS QUANTITY, 'MICRONS' AS UNIT FROM A

UNION ALL

SELECT A.N*1000000000 AS QUANTITY, 'NANOMETERS' AS UNIT FROM A

) SELECT * FROM B ORDER BY B.QUANTITY ;


I would expect it to execute CTE A once, and then carry those results into CTE B to produce results something like this:

QUANTITY UNIT
4 METERS
400 CENTIMETERS
4000 MILLIMETERS
4000000 MICRONS
4000000000 NANOMETERS

However, it produces results like this:

QUANTITY UNIT
8 METERS
700 CENTIMETERS
1000 MILLIMETERS
6000000 MICRONS
3000000000 NANOMETERS

It means it is going back and executing CTE A five times, once for every mention of A in CTE B. Not only is this unwanted and unintuitive, but it also seems unnecessarily inefficient.

What is going on, and how would a CTE genius rewrite it to produce the desired results?


BTW, the Microsoft documentation pages on CTEs contain this cryptic statement which might or might not be related:

If more than one CTE_query_definition is defined, the query definitions must be joined by one of these set operators: UNION ALL, UNION, EXCEPT, or INTERSECT.


Finally, rewriting the query to eliminate CTE B didn't help:

WITH
A AS
(   
    -- CHOOSE A ROW AT RANDOM
    SELECT   TOP 1 *
    FROM     #NUMBERS            
    ORDER BY NewID()           
)
SELECT   *
FROM     (
          SELECT A.N AS QUANTITY, 'METERS' AS UNIT FROM A
      UNION ALL

      SELECT A.N*100 AS QUANTITY, 'CENTIMETERS' AS UNIT FROM A

      UNION ALL

      SELECT A.N*1000 AS QUANTITY, 'MILLIMETERS' AS UNIT FROM A

      UNION ALL

      SELECT A.N*1000000 AS QUANTITY, 'MICRONS' AS UNIT FROM A

      UNION ALL

      SELECT A.N*1000000000 AS QUANTITY, 'NANOMETERS' AS UNIT FROM A

     ) AS B

ORDER BY B.QUANTITY ;

Sean Gallardy
  • 32,288
  • 3
  • 40
  • 77
SlowMagic
  • 151
  • 2
  • 9
  • Related: https://dba.stackexchange.com/questions/221555/what-rules-determine-when-sql-server-use-a-cte-as-an-optimization-fence – Paul White Feb 27 '23 at 11:37

3 Answers3

7

It's helpful to think of common table expressions more like expressions and less like (permanent) tables. Each time you reference a common table expression, it must re-express itself.

Here's a simple example:

DECLARE
    @t table(id int);

INSERT @t ( id ) SELECT id = 1

SET STATISTICS XML ON;

WITH t AS ( SELECT t.id FROM @t AS t ) SELECT t.* FROM t JOIN t AS t1 ON t1.id = t.id JOIN t AS t2 ON t2.id = t.id;

The query plan will look something like this, with a join to the base table variable for each join between to the common table expression:

NUTS

Likewise, UNION (ALL) will produce a reference each time as well:

WITH
    t AS
(
    SELECT
        t.id
    FROM @t AS t
)
SELECT
    t.*
FROM t 
UNION ALL
SELECT
    t.*
FROM @t AS t
UNION ALL
SELECT
    t.*
FROM @t AS t;

NUTS

If you need to stabilize a result, you need to use a:

  • #temp table
  • @table variable
  • Permanent table
Erik Darling
  • 40,781
  • 14
  • 130
  • 456
7

Other answers have explained the reason for the issue happening: basically, a CTE is just an expression that evaluates as many times as it is referenced, thus causing A to return a different value at each evaluation.

What I would like to address in my answer is this part of the question:

how would a CTE genius rewrite it to produce the desired results?

Hanging around some of the places where CTE geniuses gather to discuss their CTE-related business may have taught me a few tricks that I would like to share.

What I think would be very useful here in resolving the issue at hand are two things:

  • the CROSS APPLY operator;
  • the VALUES row constructor.

Using those two, I would rewrite specifically the B CTE like this:

B AS
(
    SELECT   X.*
    FROM     A
    CROSS APPLY
    (
        VALUES
        (A.N, 'METERS'),
        (A.N*100, 'CENTIMETERS'),
        (A.N*1000, 'MILLIMETERS'),
        (A.N*1000000, 'MICRONS'),
        (A.N*1000000000, 'NANOMETERS')
    ) AS X (QUANTITY, UNIT)
)

leaving the rest of the query intact.

The way B is defined above, A is referenced (and evaluated) only once. It still produces a rowset rather than a single row because it replaces (with the help of CROSS APPLY) the row returned by A with a rowset, and the rowset (constructed by VALUES) essentially takes A.N as an argument, producing the desired set of values.

You can test the full query at dbfiddle.uk.

Andriy M
  • 22,983
  • 6
  • 59
  • 103
1

CTEs are not always materialized as many people believe.

ypercubeᵀᴹ
  • 97,895
  • 13
  • 214
  • 305