4

Consider this as example:

SELECT TOP 1 FROM customers
WHERE firstname LIKE 'John%'
   OR id IN (SELECT id_customer 
             FROM customer_emails 
             WHERE email like 'john.doe@%' )

Does query optimizer manage to prioritize the first clause, avoiding to do the subquery if the simpler clause is true for at least 1 record (from using Top 1)?

I am mainly asking this because the above query can be rewritten as Join, which technically would be faster according to Join vs. sub-query. However if the subquery is avoided, the above example is clearly superior to a Join one in situations where the first clause is likely to be true.

  • 2
    In this specific case, you'll usually find that the SQL query optimiser converts the subquery to a join anyway. – Gareth Lyons Mar 03 '17 at 16:14
  • @GarethLyons but thats not very optim right? Since I only asked for 1 record, and there is a chance that record can be found by querying a single table, why would the query optimizer join anyway? – Alexandru Severin Mar 03 '17 at 17:58

2 Answers2

7

Does query optimizer manage to prioritize the first clause, avoiding to do the subquery if the simpler clause is true for at least 1 record.

It can do

create table customers(id int identity primary key, firstname varchar(10)) ;
insert into customers values ('John'),  ('Johnathan'), ('George');
create table customer_emails(id_customer int, email varchar(100));
insert into customer_emails values (1, 'John@example.com'),  (2, 'Johnathan@example.com');

In the plan I got the first row matched the LIKE 'John%' predicate and the scan against customer_emails was not executed at all.

enter image description here

However your question is phrased as

if the simpler clause is true for at least 1 record

That would imply that the simpler WHERE clause is evaluated in its entirety and only if that fails is the second one evaluated.

For

SELECT TOP 1 *
FROM customers
WHERE firstname LIKE 'George%'
   OR id IN (SELECT id_customer 
             FROM customer_emails 
             WHERE email like 'George.doe@%' );

Three rows were processed before one was found matching LIKE 'George%'and there were two ensuing scans on customer_emails

Rewriting as follows...

create index ix on customers(firstname) include(id)

SELECT TOP 1 *
FROM customers with(forceseek)
WHERE firstname LIKE 'George%'
   OR  firstname NOT LIKE 'George%' AND  id IN (SELECT id_customer 
             FROM customer_emails 
             WHERE email like 'George.doe@%' );

... happens to give a plan where the operators in the shaded area for the IN part don't get executed if the simpler predicate returns a row but this plan isn't guaranteed.

enter image description here

Martin Smith
  • 84,644
  • 15
  • 245
  • 333
1

You said join and I don't think this can be done with an inner join

Top 1 is non deterministic without an order by

Typically you want to let the query optimizer do its job
It will use statistics and other logic to build an efficient plan

I think this is how I would write it
The way to test is to look at the query plans

SELECT distinct c.* 
FROM customers c 
LEFT JOIN customer_emails ce 
  on ce.id_customer = c.ID
 and ce.email like 'john.doe@%'
WHERE c.firstname LIKE 'John%'
   OR ce.ID is not null 

Or

SELECT c.* 
FROM customers c 
WHERE c.firstname LIKE 'John%'
   OR exist ( select 1 
              from customer_emails ce 
                on ce.id_customer = c.ID
               and ce.email like 'john.doe@%' )
paparazzo
  • 5,043
  • 1
  • 18
  • 32
  • The top 1 part is important to the scenario, without it it would make sense for the query optimizer to query the second table as well because it would want to fetch records matched by both clauses. And I didn't add an order by because I considered its not relevant to he question. Consider that I need a fixed number of results and I don't care what the order is or that order by id asc is used. – Alexandru Severin Mar 03 '17 at 18:10