Let's say I have some sample data, 100 million rows.
CREATE TEMP TABLE foo
AS
SELECT id, md5(id::text), trunc(random()*1e6)
FROM generate_series(1,1e6) AS t(id);
This will generate a table like this..
id | md5 | trunc
----+----------------------------------+--------
1 | c4ca4238a0b923820dcc509a6f75849b | 159632
2 | c81e728d9d4c2f636f067f89cc14862c | 182952
3 | eccbc87e4b5ce2fe28308fd9f2a7baf3 | 438287
4 | a87ff679a2f3e71d9181a67b7542122c | 78240
5 | e4da3b7fbbce2345d7772b0674a318d5 | 20293
6 | 1679091c5a880faf6fb5e6087eb1b2dc | 909742
7 | 8f14e45fceea167a5a36dedd4bea2543 | 926496
8 | c9f0f895fb98ab9159f51fd0297e236d | 463718
9 | 45c48cce2e2d7fbdea1afc51c7c6ad26 | 65842
10 | d3d9446802a44259755d38e6d163e820 | 81791
How can I then generate a table with one scan that resembles this..
SELECT id, md5::text AS x
FROM foo
UNION ALL
SELECT id, trunc::text
FROM foo;
id | x
----+----------------------------------
1 | c4ca4238a0b923820dcc509a6f75849b
1 | 961453
2 | c81e728d9d4c2f636f067f89cc14862c
2 | 842364
3 | eccbc87e4b5ce2fe28308fd9f2a7baf3
3 | 784693
4 | a87ff679a2f3e71d9181a67b7542122c
4 | 602039
5 | e4da3b7fbbce2345d7772b0674a318d5
5 | 176938
...
But that generates a query plan like this,
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Append (cost=0.00..33832.52 rows=1514052 width=64) (actual time=0.025..1034.740 rows=2000000 loops=1)
-> Seq Scan on foo (cost=0.00..16916.26 rows=757026 width=64) (actual time=0.025..173.272 rows=1000000 loops=1)
-> Seq Scan on foo foo_1 (cost=0.00..16916.26 rows=757026 width=64) (actual time=0.016..715.279 rows=1000000 loops=1)
Planning time: 0.128 ms
Execution time: 1103.499 ms
(5 rows)
What would it look like to have one sec scan, and would it be faster if the table was only read once?
CROSS JOIN LATERAL. 1 M rows fits in memory of contemporary computers... but there is always a quantity that doesn't fit. – joanolo Feb 17 '17 at 00:16SELECT id, unnest(ARRAY[md5, trunc::text]) from foo;. I did some testing and found that @joanolo is correct. If the entire table fits in shared buffers, theUNION ALLis faster. If it doesn't, the other solutions are faster as they only have to read from disk once. – malisper Feb 17 '17 at 06:35