I have the following table in PostgreSQL 9.4:
CREATE TABLE dpa(
id serial NOT NULL,
currency_id integer,
amount numeric(14,3),
date timestamp without time zone,
plat_id integer,
pl_id integer,
p_id integer,
CONSTRAINT dpa_pkey PRIMARY KEY (id),
)
and settings:
work_mem = 128MB
table_size = 16 MB
And index:
CREATE INDEX idx1
ON dpa
USING btree
(plat_id, p_id, pl_id, currency_id, date DESC NULLS LAST, amount)
The table consists of approximately 242K rows. I don't have NOT NULL constraints on the column, but they are actually NOT NULL.
Now, I'm measuring performance of the queries:
I
SELECT plat_id, p_id, pl_id, player_account player_account
FROM(
SELECT plat_id, p_id, pl_id,
COALESCE(amount, 0) player_account,
ROW_NUMBER() OVER (PARTITION BY plat_id, p_id, pl_id, currency_id
ORDER BY date DESC NULLS LAST) rn
FROM dpa
) sub WHERE rn = 1;
Analyzed plan:
Subquery Scan on sub (cost=0.42..25484.16 rows=1214 width=44) (actual time=0.044..296.810 rows=215274 loops=1)
Filter: (sub.rn = 1)
Rows Removed by Filter: 27556
-> WindowAgg (cost=0.42..22448.79 rows=242830 width=28) (actual time=0.043..255.690 rows=242830 loops=1)
-> Index Only Scan using idx1 on dpa (cost=0.42..16378.04 rows=242830 width=28) (actual time=0.037..91.576 rows=242830 loops=1)"
Heap Fetches: 242830
II
SELECT DISTINCT ON(plat_id, p_id, pl_id, currency_id)
plat_id, p_id, pl_id, currency_id, amount
FROM dpa
ORDER BY plat_id, p_id, pl_id, currency_id, date DESC NULLS LAST
Analyzed plan:
Unique (cost=0.42..18794.73 rows=82273 width=28) (actual time=0.017..128.277 rows=215274 loops=1)
-> Index Only Scan using idx1 on dpa (cost=0.42..16366.43 rows=242830 width=28) (actual time=0.016..72.110 rows=242830 loops=1)
Heap Fetches: 242830
As can be seen, the second query is faster than the first one. But when I execute this queries in PGAdmin I got the following average statistics:
The query with ROW_NUMBER()(the first): 4999 ms
The query with DISTINCT ON(the second): 5654 ms
I understand that bandwith/latency overhead on such large result set is significant. All queries produce 215274 rows.
QUESTION: Why does it take more time to receive all rows in the second case than in the first one, although the planner shows that the second plan is more optimal?
COALESCE(amount, 0)in the first query - so pgAdmin does not have to render NULL values. BTW, if you are looking for best performance and there are many rows per group, a similar query like the one I gave you for your related question yesterday might be fastest - in combination with a matching index. For few rows per group,DISTINCT ONshould be fastest: http://stackoverflow.com/a/7630564/939860 – Erwin Brandstetter Oct 22 '15 at 16:19LIMIT 1in the inner query make more sense? Perhaps with aGROUP BY(rather than using aDISTINCTclause) – mpag Oct 22 '15 at 16:44LIMIT 1in aLATERALjoin, like I outlined in the answer I linked to in my previous comment. More details here: http://stackoverflow.com/a/25536748/939860 – Erwin Brandstetter Oct 23 '15 at 01:30