7

Why

SELECT Barraportfolioname
FROM   portfolio
WHERE  id IN (SELECT DISTINCT i1.portfolioid
              FROM   Import i1
                     LEFT OUTER JOIN Import i2
                       ON i1.PortfolioID = i2.PortfolioID
                          AND i2.ImportSetID = 82
              WHERE  i1.ImportSetID = 83
                     AND i2.ID IS NULL)  

is taking 0 seconds whereas following query is taking 5 seconds.

SELECT DISTINCT p.BarraPortfolioName AS name
FROM   Import i1
       INNER JOIN Portfolio p
         ON p.ID = i1.PortfolioID
       LEFT OUTER JOIN Import i2
         ON i1.PortfolioID = i2.PortfolioID
            AND i2.ImportSetID = 82
WHERE  i1.ImportSetID = 83
       AND i2.ID IS NULL;  

I'm using SQL Server. Both the tables have indexes on all the columns which are being used in query i.e. portfolioid, id,importsetid.

Edit by gbn, based on OP comment

They said:

This works much better than previous two queries.

select
   BarraPortfolioName
from 
   (
   select distinct p.BarraPortfolioName,portfolioid
   from
        import i1
        inner join 
        portfolio p on p.id=i1.portfolioid
   where 
        importsetid = ?
   ) as p1
   left outer join
   (
   select distinct portfolioid 
   from import 
   where importsetid = ?
   ) as p2 on p1.portfolioid = p2.portfolioid
where 
   p2.portfolioid is null
gbn
  • 69,809
  • 8
  • 163
  • 243
techExplorer
  • 261
  • 1
  • 4
  • 6
    Post the query plans – Gaius Dec 08 '11 at 08:52
  • In it's current form this question is unanswerable (and will certainly be closed if you leave it like this) - but if there are "bad stats" that should be visible in the plans? – Jack Douglas Dec 08 '11 at 10:38
  • 1
    But surely good query tuners can see the difference in the queries a mile off... – Rob Farley Dec 08 '11 at 11:49
  • I looked at data in my table and found that there exist many to many mapping between portfolioid i.e. one import set same portfolioid multiple time. So I re-wrote my query to : "select BarraPortfolioName from (select distinct p.BarraPortfolioName,portfolioid from import i1 inner join portfolio p on p.id=i1.portfolioid where importsetid = ?) as p1 left outer join (select distinct portfolioid from import where importsetid = ?) as p2 on p1.portfolioid = p2.portfolioid where p2.portfolioid is null".This works much better than previous two queries. – techExplorer Dec 09 '11 at 05:05
  • 1
    How can you compare zero seconds vs zero seconds with the re-written query? Try with SET STATISTICS IO ON please to compare queries 1 and 3 – gbn Dec 09 '11 at 13:20
  • @gbn Actually first two queries where running forever until I added one separate index on ImportSetId and Id and portfolioid as Included column. Third query is running fine without this index. – techExplorer Dec 12 '11 at 05:47

3 Answers3

9

This assumes that both give the same results

  • The 1st is a "semi-join" because of the IN (subquery) (the DISTINCT isn't needed)
    This means the subquery can "short circuit"

  • The 2nd is an outer join then restriction, followed by a DISTINCT aggregate.
    This is 3 main discrete operations

This "short circuit" is the main reason for the difference even with the outer join in the sub query.

For simpler queries, the 2nd query would be optimised to the same plan as the 1st because it is semantically the same. More likely the later the version etc

See this for more (same logic, just reversed): The use of NOT logic in relation to indexes

And this about "IN vs. JOIN vs. EXISTS" from SO user Quassnoi on his site

And a similar SO example: https://stackoverflow.com/a/7221395/27535

gbn
  • 69,809
  • 8
  • 163
  • 243
4

Have a try with:

SELECT
   (select p.BarraPortfolioName from Portfolio p where p.ID = i1.PortfolioID) AS name  
FROM Import i1  
WHERE i1.ImportSetID = 83 
group by i1.PortfolioID  
HAVING NOT EXISTS     
    (SELECT * FROM Import i2 
    WHERE i1.PortfolioID = i2.PortfolioID 
    AND i2.ImportSetID = 82);

Because what you're wanting to do is find the distinct i1.PortfolioIDs, filter that list for ones that weren't in importset 82, and show the name for those ids.

Rob Farley
  • 16,199
  • 2
  • 38
  • 61
  • Indeed you query is even better than 3rd query. Statistic for 3rd query : Table 'Import'. Scan count 375, logical reads 1222, physical reads 0, Table 'Portfolio'. Scan count 0, logical reads 52140, physical reads 0 Statistic for your query: Table 'Portfolio'. Scan count 0, logical reads 3, physical reads 0, Table 'Import'. Scan count 375, logical reads 1172, physical reads 0 – techExplorer Dec 12 '11 at 05:48
  • You might want to create index ixBlah on Import (ImportSetID, PortfolioID) -- if you don't have one already. – Rob Farley Dec 12 '11 at 05:52
3

If you have no unique index on BarraPortfolioName it needs to check through them all for duplicates, which is unnecessary in the first query because your IN clause handles that for you.

Try creating a unique index (or constraint) and seeing if this does the trick. Or else change the second script by dropping the distinct and putting GROUP BY p.id, p.BarraPortfolioName at the end. This should let it short-circuit the distinctification process.

Rob Farley
  • 16,199
  • 2
  • 38
  • 61
  • You're missing the point. It's about putting p.id into the grouping. – Rob Farley Dec 08 '11 at 12:32
  • Removed previous comments as I see what you mean with the GROUP BY now. I thought that would change the semantics but if BarraPortfolioName is not unique then the 2 queries in the OP are semantically different anyway as the first one would also need DISTINCT added. I still find in testing that query 1 uses a semi join and query 2 doesn't though so the OP should just stick to 1 IMO. – Martin Smith Dec 08 '11 at 13:04
  • Yup. The from clause should be: – Rob Farley Dec 08 '11 at 13:15
  • FROM Import i1 LEFT OUTER JOIN Import i2 ON i1.PortfolioID = i2.PortfolioID AND i2.ImportSetID = 82 INNER JOIN Portfolio p ON p.ID = i1.PortfolioID – Rob Farley Dec 08 '11 at 13:16
  • ...and this is a problem with the QO that is barely acknowledged by the product group. – Rob Farley Dec 08 '11 at 13:19
  • Should that change give me a semi join? Not seeing it. Maybe it needs R2 as I'm testing on 2008 R1. – Martin Smith Dec 08 '11 at 13:26
  • After midnight here - I'll get this sorted in the morning. – Rob Farley Dec 08 '11 at 13:50
  • Isn't this just a "semi join" va "outer join + restriction + distinct" problem? – gbn Dec 08 '11 at 19:22
  • Yes, but the reason for the difference is down to the distinctness of the name field and the outer join order. The second one seems to intend a semi join, and the OP is curious about why the second one isn't being treated as such. – Rob Farley Dec 08 '11 at 21:29
  • @gbn - This question here shows that sometimes the QO can transform it to a semi join. Would be interesting to know the exact criteria that need to be met for this. – Martin Smith Dec 09 '11 at 13:09
  • @Martin Smith: I suspect complexity of query and how many possible plans are examined before cost limit kicks in. The other one is simpler (I answered there too) then this. I'd prefer the 1st query too because I wouldn't want to rely on optimisations and short circuits – gbn Dec 09 '11 at 13:12
  • @RobFarley: I read differently as "why is query 1 more efficient"? Would be interesting to see if your suggestion fixes it though – gbn Dec 09 '11 at 13:15
  • Sorry - the weekend stole me. Monday again now. So anyway - the query is about getting the name for the thing which is being searched for. There's a SemiJoin between two copies of Import, and then the name is being found. So essentially, the query is: SELECT (select p.BarraPortfolioName from Portfolio p where p.ID = i1.PortfolioID) AS name FROM
    Import i1 LEFT OUTER JOIN Import i2 ON i1.PortfolioID = i2.PortfolioID AND i2.ImportSetID = 82
    WHERE i1.ImportSetID = 83 AND i2.ID IS NULL group by i1.PortfolioID;
    – Rob Farley Dec 12 '11 at 03:59
  • Or better still: SELECT (select p.BarraPortfolioName from Portfolio p where p.ID = i1.PortfolioID) AS name FROM
    Import i1 WHERE i1.ImportSetID = 83 group by i1.PortfolioID HAVING NOT EXISTS (SElECT * FROM Import i2 WHERE i1.PortfolioID = i2.PortfolioID AND i2.ImportSetID = 82);
    – Rob Farley Dec 12 '11 at 04:04