6

In my application server, I would like to paginate a dataset using LIMIT and OFFSET, and additionally return the total count of the dataset to the user.

Instead of making two remote calls to the database:

select count(1) as total_count from foo;
select c1 from foo;

I thought it would be smarter to do it in a single database call:

select c1, count(1) over (partition by null) from foo;

However, adding this window function results in an execution time that is an order of magnitude longer compared to not using the window function.

I find it surprising because the analogous select count(1) from foo takes only twice the amount of time as select c1 from foo. Yet converting this to a window function results in a degradation.

Moreover, using the following alternative using a subquery is very fast:

select c1, (select count(1) from foo) as total_count from foo;

I would have expected that postgresql would be able to optimize on the partition by null

I tried this in Oracle and found a similar performance penalty.

What explains why there is a performance penalty here? Would it be relatively easy, or even worthwhile, for the core postgresql devs to make a change to optimize this, e.g. by converting window functions that are PARTITION BY NULL into a subquery?


Setup:

drop table foo;
create table foo (c1 int);

insert into foo select i from generate_series(0, 100000) i;

analyze foo;

Regular SELECT:

=> explain analyze select c1 from foo;
                                                QUERY PLAN
----------------------------------------------------------------------------------------------------------
 Seq Scan on foo  (cost=0.00..1443.01 rows=100001 width=4) (actual time=0.021..6.848 rows=100001 loops=1)
 Planning Time: 0.045 ms
 Execution Time: 10.021 ms

Not using the window function results in execution times around 10ms.

With a COUNT(1) OVER (PARTITION BY NULL) window function:

=> explain analyze select c1, count(1) over (partition by null) as total_count from foo;
                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 WindowAgg  (cost=0.00..2943.03 rows=100001 width=44) (actual time=63.828..100.321 rows=100001 loops=1)
   ->  Seq Scan on foo  (cost=0.00..1443.01 rows=100001 width=36) (actual time=0.025..17.727 rows=100001 loops=1)
 Planning Time: 0.071 ms
 Execution Time: 106.386 ms

Using the window function results in an execution time of 100 ms. This is an order of magnitude more expensive than the same query without this window function.

With regular SELECT COUNT(1)

=> explain analyze select count(1) from foo;
                                                   QUERY PLAN
----------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1693.01..1693.02 rows=1 width=8) (actual time=19.876..19.876 rows=1 loops=1)
   ->  Seq Scan on foo  (cost=0.00..1443.01 rows=100001 width=0) (actual time=0.026..9.238 rows=100001 loops=1)
 Planning Time: 0.066 ms
 Execution Time: 19.908 ms

The regular SELECT COUNT(1) takes about 20ms.

And using the more optimal subquery form:

=> explain analyze select c1, (select count(1) from foo) as total_count from foo;
                                                          QUERY PLAN                                                    
------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on foo  (cost=1693.02..3136.03 rows=100001 width=12) (actual time=18.554..30.492 rows=100001 loops=1)
   InitPlan 1 (returns $0)
     ->  Aggregate  (cost=1693.01..1693.02 rows=1 width=8) (actual time=18.533..18.534 rows=1 loops=1)
           ->  Seq Scan on foo foo_1  (cost=0.00..1443.01 rows=100001 width=0) (actual time=0.010..8.438 rows=100001 loops=1)
 Planning Time: 0.074 ms
 Execution Time: 33.696 ms

This one takes around ~30 ms, which is what I would expect (select count(1) takes 20 ms, select c1 takes 10 ms).

Erwin Brandstetter
  • 175,982
  • 27
  • 439
  • 600
Matthew Moisen
  • 207
  • 3
  • 10
  • 2
    Btw: it's a completely wrong myth that count(1) is faster than count(*) - in fact in Postgres count(1) is actually slightly slower than count(*) –  Mar 06 '22 at 20:07
  • @a_horse_with_no_name Most people do count(1) out of habit, not because they think it is faster. – Matthew Moisen Mar 06 '22 at 20:10
  • 2
    Why the partition by null? Did you ty count(*) over ()? –  Mar 06 '22 at 21:27
  • @a_horse_with_no_name count(1) over () has identical execution times to count(1) over (partition by null) – Matthew Moisen Mar 06 '22 at 22:51

1 Answers1

5

Pagination?

First off, OFFSET / LIMIT are primitive tools for pagination, don't scale well, and don't work well under concurrent write load. Depending on your use case, there are much smarter solutions. See:

Performance of total count

OVER (PARTITION BY NULL) is a pointless waste. Use the equivalent, faster OVER () instead.

count(1) is another (minor) pointless waste. Use count(*) instead.

Still, the observation is solid, I can reproduce it as expected.

However, it's still misleading. The test table is unrealistic, with just a single integer column. And typically you would add WHERE clauses and/or involve indexes. So your original test validity is limited.

A more realistic test case with (at least) some rudimentary payload columns and an index shows different insights:

CREATE TABLE foo (
  id int PRIMARY KEY
, payload text                    -- mininal payload of 32 byte text ...
, dt timestamptz DEFAULT now());  -- ... and a timestamp column

INSERT INTO foo (id, payload) SELECT i, md5(i::text)::text FROM generate_series(1, 100000) i;

Consider the various tests in the fiddle (using Postgres 14; Postgres 13 is very similar, you can just switch the engine and re-run):
db<>fiddle here

count(*) OVER () is ~ 10 % faster than count(1) OVER (PARTITION BY NULL). Pay special attention to the more reliable test with warm cache ("repeat main test x with warm cache".
EXPLAIN even expects it to be faster estimating 3185 instead of 3435.

The subquery is still ~ 2-3x faster for the simple case without WHERE.

But adding a selective WHERE clause is a game changer. And that's the much more common use case.

And if the query does not have (perfect) index support, the picture changes again. Now, the subquery is substantially slower. Two sequential scans outweigh the overhead added by the window function.

Real-life examples are typically more messy, with table / index bloat, more columns, joins, computations, etc. Then a separate scan for the count in a subquery typically gets more expensive, yet.

Related:

Erwin Brandstetter
  • 175,982
  • 27
  • 439
  • 600
  • I'm not seeing a difference with your 3 column table vs my 1 column table. The execution times are similar. Using the window function is still 10x slower. Also I don't see any difference with OVER(PARTITION BY NULL) vs OVER (). But even if OVER () is 10% faster, it is still about an order of magnitude slower than the subquery (on my local PC at least; you seem to have a different result of only 2-3x slower). Also, the index doesn't help. I wouldn't expect it to either, a sequential scan is the only sensible result when no WHERE clause is used. – Matthew Moisen Mar 07 '22 at 04:22
  • If I add a column that is relatively selective, for example having 10-1000 of the same value, the performance degradation of the window function isn't that high. I agree that in the general case users would be filtering but it is possible for the user to ask for all the data without a filter. – Matthew Moisen Mar 07 '22 at 04:25
  • I'm surprised to see that even a select count(1) from foo (or select(*)) results in a sequential scan after adding the primary key. I would have expected it to do an index scan. On Oracle, a select count(1) on a table with a NOT NULL indexed column results in a "fast full scan" of the index, rather than a full table scan/sequential scan of the table. This is a bit faster. – Matthew Moisen Mar 07 '22 at 04:33
  • @MatthewMoisen: While counting the whole table (which isn't much wider than the index), a sequential scan is (expected to be) faster than using the index. Test with a WHERE clause (like demonstrated in my fiddle) and you'll see the index used. There are many factors at play, my fiddle highlights a couple of them. – Erwin Brandstetter Mar 07 '22 at 04:43
  • 2
    I also noticed that the window function seems to use work_mem while the sub-query doesn't. Increasing work_mem to 8MB (instead of the default 4MB) got rid of some temp buffers in the execution plan (only visible with explain (analyze, buffers) improving the performance slightly. –  Mar 07 '22 at 06:13