4

I have a table (t0), in my Postgres DB, with data that looks something like this:

t1_id  t2_id
1      1      
2      1
2      1
4      null
4      null
5      null

And I have a query to return my desired results of:

t1_id  t2_id
1      1
4      null
5      null

My query looks something like this:

(
  SELECT DISTINCT ON (t2_id) t1_id, t2_id
  FROM t0
  WHERE t2_id IS NOT NULL
)
UNION ALL
(
  SELECT DISTINCT ON (t1_id) t1_id, t2_id
  FROM t0
  WHERE t2_id IS NULL
)

Is there a faster way to do an operation like this? It's not too bad, but I'm doing it in several places (with joins) and all these repeated queries seems to slow stuff down a bit. Seems like there must be a better way.

Here's the query in fiddle form: http://sqlfiddle.com/#!15/d41d8/3603

Aaron Bertrand
  • 180,303
  • 28
  • 400
  • 614
Sam
  • 637
  • 2
  • 6
  • 12
  • 1
    I don't see a reason for the union all: select distinct on (t2_id) t1_id, t2_id from t0 will do just fine: http://sqlfiddle.com/#!15/d41d8/3591 Btw. the parentheses around the individual selects are totally useless for the union –  Oct 12 '14 at 04:47
  • @a_horse_with_no_name Thanks for the response. Nice catch. My example data, and output, we're missing a row.

    I updated the question by adding another row 5 null to the data and query result.

    – Sam Oct 12 '14 at 06:25
  • 1
    @Sam: That doesn't change anything. You example is trivial and does not need UNION ALL. It needs ORDER BY for the DISTINCT ON to have determinate results but that's irreleveant. Try adding a query you have and not strip it too much. – ypercubeᵀᴹ Oct 12 '14 at 11:09
  • @ypercube, good point. The problem is I want the query here http://sqlfiddle.com/#!15/d41d8/3592, to include 5, null. It combines 4 null and 5 null which I want to keep separate. – Sam Oct 12 '14 at 16:45
  • Cross post http://stackoverflow.com/questions/26344612/is-there-a-faster-way-to-get-union-all-behavior-in-postgres/26344836#26344836 – Sam Oct 20 '14 at 04:25

2 Answers2

5

For the simple case, I can only think of minor improvements to the query:

(
  SELECT DISTINCT ON (t2_id)
         t1_id, t2_id
  FROM   t0
  WHERE  t2_id IS NOT NULL
  ORDER  BY t2_id, t1_id  -- to get consistent results
)
UNION ALL
(
  SELECT DISTINCT ON (t1_id)
         t1_id, NULL      -- cheaper
  FROM   t0
  WHERE  t2_id IS NULL
  -- if you retrieve more columns, add ORDER BY, too
)
  • As ypercube mentioned You need to add ORDER BY with an unambiguous list of expressions to get deterministic results.

  • You can use the constant NULL instead of t2_id in the second leg of the query. Also relevant for below index support.

  • The key to performance is indexing. Try two partial indexes matching the two parts of the query:

    CREATE INDEX t0_part1_idx ON t0 (t2_id, t1_id) WHERE t2_id IS NOT NULL;
    CREATE INDEX t0_part2_idx ON t0 (t1_id)        WHERE t2_id IS NULL;
    

    You may or may not want to include additional columns to allow index-only scans.

Depending on table size and data distribution, there may be faster alternatives:

Single SELECT

If you want to condense it into a single SELECT:

SELECT DISTINCT ON (coalesce(t2_id, t1_id), t2_id)
       t1_id, t2_id
FROM   t0
ORDER  BY coalesce(t2_id, t1_id), t2_id, t1_id;

Equivalent, except for sort order. If you want this to be fast, try a functional index:

CREATE INDEX t0_func_idx ON t0 (coalesce(t2_id, t1_id), t2_id, t1_id);
Erwin Brandstetter
  • 175,982
  • 27
  • 439
  • 600
4

Give your example, this will do it:

select distinct on (coalesce(t2_id, t1_id)) t1_id, t2_id
from t0
order by coalesce(t2_id, t1_id);

It essentially says "do a distinct on t2_id, but if that is null use t1_id instead".

  • Thanks! That's perfect! Can't believe I couldn't find that. – Sam Oct 13 '14 at 17:09
  • @Sam: That's actually not correct. Consider these corner cases: (3,3), (3,NULL) or (null,7), (7,null): http://sqlfiddle.com/#!15/4db1b8/1. The test case in the question does not reveal it. – Erwin Brandstetter Oct 20 '14 at 11:35
  • @ErwinBrandstetter: good catch. But based on the sample data I assumed t1_id would never be null –  Oct 20 '14 at 11:37
  • @a_horse_with_no_name: Might be the case, but would still leave the first example: (3,3), (3,NULL). The 2nd case is just to show why a simple constraint CHECK (t1_id <> t2_id) would still not make it right - only in combination with a NOT NULL constraint on t1_id ... – Erwin Brandstetter Oct 20 '14 at 11:45