1

I saw this query that finds all the employees with more than one degree:

select id, name
from employee E
where exists (
 select *
 from academics A1, academics A2
 where  A1.emp_id = E.id and
        A2.emp_id = E.id and
        A1.discipline != A2.discipline
 )

But why is the exists necessary? Why not just do:

select id, name
from employee E, academics A1, academics A2
where A1.emp_id = E.id and
      A2.emp_id = E.id and
      A1.discipline != A2.discipline

Are the two above equivalent?

shinzou
  • 253
  • 1
  • 7
  • 1
    Please tag your dbms. – McNets Mar 09 '17 at 16:11
  • @McNets not using any live dbms so, pen and paper. – shinzou Mar 09 '17 at 16:13
  • 1
    No, they are not equivalent. The first is a semijoin. The second is a join. It can give (the 2nd) more rows in the result. – ypercubeᵀᴹ Mar 09 '17 at 16:23
  • More duplicate rows? @ypercubeᵀᴹ – shinzou Mar 09 '17 at 16:29
  • 1
    Yes, the rows would be duplicated (actually multiplied, depending on the values in academics). – ypercubeᵀᴹ Mar 09 '17 at 16:31
  • Also, of course, the second query may immediately fail. If academics has a column named id or name, the query will not know which table those columns in the select list come from. – RDFozz Mar 09 '17 at 16:34
  • 1
    At a minimum: every employee who shows up would show up twice, Example. Sam Jones has a degree in Physics and a degree in Chemistry. His ID will show up twice, because it will match when A1 is Physics and A2 is Chemistry, and when A1 is Chemistry and A2 is Physics. – RDFozz Mar 09 '17 at 16:37
  • Note: my comment was explicitly supposed to be an expansion of @ypercube's note on duplicate or more, explaining the whys of it. – RDFozz Mar 09 '17 at 16:55

2 Answers2

3

I'd suggest you another approach.

COUNT DISTINCT disciplines by employee.

Then use EXISTS to check if one Employee has more than one degree.

select      e.id, e.name
from        Employees e
where exists (select    1
              from      Academics a
              where     a.emp_id = e.id
              having count(distinct a.discipline) > 1);

Or JOIN it with table Employees and show these were NumDegrees > 1

select      e.id, e.name, nd.NumDegrees
from        Employees e
inner join (select    emp_id, count(distinct a.discipline) NumDegrees
            from      Academics a
            group by  a.emp_id) nd
on          e.id = nd.emp_id
where       nd.NumDegrees > 1;
ypercubeᵀᴹ
  • 97,895
  • 13
  • 214
  • 305
McNets
  • 23,749
  • 10
  • 48
  • 88
2

For completeness, I'd add one more variant, which might clearer for somebody, and which any decent optimizer would consider equivalent to the second one proposed by McNets:

SELECT e.id, e.name
  FROM Employees e
 WHERE e.id IN (SELECT a.emp_id
                          FROM Academics a
                          GROUP BY a.emp_id
                          HAVING COUNT(DISTINCT a.discipline) > 1);
ypercubeᵀᴹ
  • 97,895
  • 13
  • 214
  • 305
Dario
  • 748
  • 4
  • 13
  • 1
    I disagree on the "it's really a waste". It might perform better or worse than your versions, depending on many things. – ypercubeᵀᴹ Mar 09 '17 at 18:06
  • @ypercubeᵀᴹ Well, more often than not, it is better to avoid self joins if you can. However, you are right in the general case and I deleted the sentence with which you disagreed. – Dario Mar 09 '17 at 20:53