Using exists in a query instead of a regular where

Question

I saw this query that finds all the employees with more than one degree:

select id, name
from employee E
where exists (
 select *
 from academics A1, academics A2
 where  A1.emp_id = E.id and
        A2.emp_id = E.id and
        A1.discipline != A2.discipline
 )

But why is the exists necessary? Why not just do:

select id, name
from employee E, academics A1, academics A2
where A1.emp_id = E.id and
      A2.emp_id = E.id and
      A1.discipline != A2.discipline

Are the two above equivalent?

No, they are not equivalent. The first is a semijoin. The second is a join. It can give (the 2nd) more rows in the result. — ypercubeᵀᴹ, Mar 09 '17 at 16:23
Yes, the rows would be duplicated (actually multiplied, depending on the values in academics). — ypercubeᵀᴹ, Mar 09 '17 at 16:31
Also, of course, the second query may immediately fail. If academics has a column named id or name, the query will not know which table those columns in the select list come from. — RDFozz, Mar 09 '17 at 16:34
At a minimum: every employee who shows up would show up twice, Example. Sam Jones has a degree in Physics and a degree in Chemistry. His ID will show up twice, because it will match when A1 is Physics and A2 is Chemistry, and when A1 is Chemistry and A2 is Physics. — RDFozz, Mar 09 '17 at 16:37
Note: my comment was explicitly supposed to be an expansion of @ypercube's note on duplicate or more, explaining the whys of it. — RDFozz, Mar 09 '17 at 16:55

score 3 · Answer 1 · edited Mar 09 '17 at 23:04

3

I'd suggest you another approach.

COUNT DISTINCT disciplines by employee.

Then use EXISTS to check if one Employee has more than one degree.

select      e.id, e.name
from        Employees e
where exists (select    1
              from      Academics a
              where     a.emp_id = e.id
              having count(distinct a.discipline) > 1);

Or JOIN it with table Employees and show these were NumDegrees > 1

select      e.id, e.name, nd.NumDegrees
from        Employees e
inner join (select    emp_id, count(distinct a.discipline) NumDegrees
            from      Academics a
            group by  a.emp_id) nd
on          e.id = nd.emp_id
where       nd.NumDegrees > 1;

edited Mar 09 '17 at 23:04

ypercubeᵀᴹ

97,895
13
214
305

answered Mar 09 '17 at 16:54

McNets

23,749
10
48
88

What's select 1? – shinzou Mar 09 '17 at 16:56
Exists simply evaluate if sentence returns rows. https://msdn.microsoft.com/es-es/library/ms188336.aspx – McNets Mar 09 '17 at 16:58
never mind I totally missed that. (you gotta add the group by back.) =( – Evan Carroll Mar 09 '17 at 22:21
@EvanCarroll you're right, http://dbfiddle.uk/?rdbms=sqlserver_2016&fiddle=844eb3a04070eb337eaa006d12e231dd – McNets Mar 09 '17 at 22:32
That's only because SQL server accepts unstandardized syntax. You still need the group by it's just implicit there. – Evan Carroll Mar 09 '17 at 22:46
@EvanCarroll the syntax is fine with and without the GROUP BY there. Standard SQL. – ypercubeᵀᴹ Mar 09 '17 at 22:58

score 2 · Answer 2 · edited Mar 09 '17 at 23:05

2

For completeness, I'd add one more variant, which might clearer for somebody, and which any decent optimizer would consider equivalent to the second one proposed by McNets:

SELECT e.id, e.name
  FROM Employees e
 WHERE e.id IN (SELECT a.emp_id
                          FROM Academics a
                          GROUP BY a.emp_id
                          HAVING COUNT(DISTINCT a.discipline) > 1);

edited Mar 09 '17 at 23:05

ypercubeᵀᴹ

97,895
13
214
305

answered Mar 09 '17 at 17:52

Dario

748
4
13

1

I disagree on the "it's really a waste". It might perform better or worse than your versions, depending on many things. – ypercubeᵀᴹ Mar 09 '17 at 18:06
@ypercubeᵀᴹ Well, more often than not, it is better to avoid self joins if you can. However, you are right in the general case and I deleted the sentence with which you disagreed. – Dario Mar 09 '17 at 20:53

Using exists in a query instead of a regular where

2 Answers2