what does this query do?

Question

Can you explain me the bold part?

    select   company_code, 
             founder,
             (select count(distinct lead_manager_code) 
              from Lead_Manager 
              where  company_code = c.company_code 
             )
    from     Company c 
    order by company_code;

This feature is often called a virtual column or a subquery select. — Evan Carroll, Mar 14 '17 at 15:09

score 2 · Answer 1 · answered Mar 14 '17 at 12:55

The answer is in the bold portion of my answer.

Starters

You have a table of Lead Managers (Lead_Managers) and a table for Companies (Company). A Lead Manager can be in multiple companies and can have multiple Lead Manager roles in one company.

Inner Query

The inner query retrieves the DISTINCT number of (COUNT) Lead Manager's Codes (Lead_Manager.company_code) where the Lead_Manager's company code matches ...

Outer Query

... the company code (c.company_code) in the Company (Company c) table.

Result

The result should be a list of Company Codes, Founders and distinct count of Lead Managers for each given company. The Distinct will only select unique Lead Manager', or put another way: Even if a lead manager has multiple Lead Manager Roles in a company only 1 count will be made for that company.

score 1 · Answer 2 · answered Mar 14 '17 at 10:39

c is an alias for Company, and the unqualified company_code comes from Lead_Manager, so essentially it's the same as saying:

select company_code, founder, (select count(distinct lead_manager_code) 
                               from Lead_Manager 
                               where Lead_Manager.company_code=Company.company_code) 
from Company 
order by company_code;

Evan Carroll · Answer 3 · 2017-03-14T17:11:28.933

As an aside, this is a short-hand. You never have to write this like this, and it's always a bad idea, imho, because it's correlated with the parent query (excessively complex for SELECT), bloats the SELECT-clause and violates the basic structure of the query, and is often likely to be slower.

select   company_code, 
         founder,
         (select count(distinct lead_manager_code) 
          from Lead_Manager 
          where  company_code = c.company_code 
         )
from     Company c 
order by company_code;

Is better written with an LEFT OUTER JOIN

SELECT   company_code, 
         founder,
         t.count
FROM     Company c 
LEFT OUTER JOIN (
  select company_code, count(distinct lead_manager_code)
  FROM lead_manager
  GROUP BY company_code
) AS t
  USING (company_code)
ORDER BY company_code;

Speed

Analysis with PostgreSQL should be applicable on most databases.

Multirow

The VIRTUAL COLUMN method is likely to result in extra seq scans as the planner won't know to bundle them up. Worth comparing speeds.. 145 ms for the VIRTUAL COLUMN method

EXPLAIN ANALYZE
  SELECT *, (
    SELECT count(distinct x)
    FROM foobar AS f2
    WHERE f1.x = f2.x
  )
  FROM foobar AS f1;
                                                     QUERY PLAN                                                      
---------------------------------------------------------------------------------------------------------------------
 Seq Scan on foobar f1  (cost=0.00..17527.50 rows=1000 width=4) (actual time=0.252..145.119 rows=1000 loops=1)
   SubPlan 1
     ->  Aggregate  (cost=17.50..17.51 rows=1 width=4) (actual time=0.144..0.144 rows=1 loops=1000)
           ->  Seq Scan on foobar f2  (cost=0.00..17.50 rows=1 width=4) (actual time=0.065..0.140 rows=1 loops=1000)
                 Filter: (f1.x = x)
                 Rows Removed by Filter: 999
 Planning time: 0.108 ms
 Execution time: 145.268 ms

Yet 2 ms for the LEFT OUTER JOIN method

EXPLAIN ANALYZE
  SELECT x, f2.count
  FROM foobar
  LEFT OUTER JOIN (
    SELECT x, count(*)
    FROM foobar
    GROUP BY x
  ) AS f2 USING (x);
                                                         QUERY PLAN                                                         
----------------------------------------------------------------------------------------------------------------------------
 Hash Right Join  (cost=47.50..81.25 rows=1000 width=12) (actual time=1.246..2.154 rows=1000 loops=1)
   Hash Cond: (foobar_1.x = foobar.x)
   ->  HashAggregate  (cost=20.00..30.00 rows=1000 width=4) (actual time=0.750..1.100 rows=1000 loops=1)
         Group Key: foobar_1.x
         ->  Seq Scan on foobar foobar_1  (cost=0.00..15.00 rows=1000 width=4) (actual time=0.009..0.155 rows=1000 loops=1)
   ->  Hash  (cost=15.00..15.00 rows=1000 width=4) (actual time=0.484..0.484 rows=1000 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 44kB
         ->  Seq Scan on foobar  (cost=0.00..15.00 rows=1000 width=4) (actual time=0.015..0.185 rows=1000 loops=1)
 Planning time: 0.262 ms
 Execution time: 2.294 ms

table foobar created with

CREATE TABLE foobar AS SELECT x FROM generate_series(1,1000) AS x;

Single-row

Even with the single-row case, with PostgreSQL 9.5 the left outer join is faster. I just mean to say it doesn't have to be that way.

EXPLAIN ANALYZE
  SELECT *, (
    SELECT count(distinct x)
    FROM foobar AS f2
    WHERE f1.x = f2.x
  )
  FROM foobar AS f1;
                                                   QUERY PLAN                                                    
-----------------------------------------------------------------------------------------------------------------
 Seq Scan on foobar f1  (cost=0.00..2.04 rows=1 width=4) (actual time=0.043..0.044 rows=1 loops=1)
   SubPlan 1
     ->  Aggregate  (cost=1.01..1.02 rows=1 width=4) (actual time=0.025..0.026 rows=1 loops=1)
           ->  Seq Scan on foobar f2  (cost=0.00..1.01 rows=1 width=4) (actual time=0.005..0.006 rows=1 loops=1)
                 Filter: (f1.x = x)
 Planning time: 0.114 ms
 Execution time: 0.097 ms
(7 rows)

EXPLAIN ANALYZE
  SELECT x, f2.count
  FROM foobar
  LEFT OUTER JOIN (
    SELECT x, count(*)
    FROM foobar
    GROUP BY x
  ) AS f2 USING (x);
                                                     QUERY PLAN                                                      
---------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=1.01..2.06 rows=1 width=12) (actual time=0.026..0.027 rows=1 loops=1)
   Join Filter: (foobar.x = foobar_1.x)
   ->  Seq Scan on foobar  (cost=0.00..1.01 rows=1 width=4) (actual time=0.010..0.010 rows=1 loops=1)
   ->  HashAggregate  (cost=1.01..1.02 rows=1 width=4) (actual time=0.010..0.010 rows=1 loops=1)
         Group Key: foobar_1.x
         ->  Seq Scan on foobar foobar_1  (cost=0.00..1.01 rows=1 width=4) (actual time=0.001..0.002 rows=1 loops=1)
 Planning time: 0.168 ms
 Execution time: 0.088 ms
(8 rows)