2
explain select count(x.*) from customer x;
...
->  Partial Aggregate  (cost=27005.45..27005.46 rows=1 width=8)
  ->  Parallel Seq Scan on customer x  (cost=0.00..26412.56 rows=237156 width=994)

explain select count(*) from customer x;
...

-> Partial Aggregate (cost=27005.45..27005.46 rows=1 width=8) -> Parallel Seq Scan on customer x (cost=0.00..26412.56 rows=237156 width=0)


The COUNT(x.*) here makes the width in the explain result read unnecessary row data.

I thought they should be identical, but it seems not, why?

Erwin Brandstetter
  • 175,982
  • 27
  • 439
  • 600
jilen
  • 123
  • 4
  • 1
    Because Postgres doesn't optimize the expression x.* away and checks for each and every row whether it's null or not: See e.g. here: https://blog.jooq.org/2019/09/19/whats-faster-count-or-count1/ –  Mar 18 '22 at 10:11
  • @a_horse_with_no_name Is it too hard to optimize such query or they just have not do it ? – jilen Mar 18 '22 at 11:39
  • 1
    It seems to be a matter of "nobody cared": https://www.postgresql.org/message-id/flat/df354b76-a645-73fc-9645-4508e5291d71%40gmx.net –  Mar 18 '22 at 11:46
  • @a_horse_with_no_name you should post your comment as an answer, because that's what it is. – mustaccio Mar 18 '22 at 12:44

1 Answers1

2

Logically, both are identical - because x.* always counts, even when all columns are NULL.
But Postgres has a separate implementation for count(*).

It does not bother with any expression at all and only considers the existence of live rows. That's slightly faster, which sums up to a relevant difference over many rows.
The performance penalty for count(x.*) grows with the number of columns / width of rows, and will be rather substantial for wide rows like yours (width=994).

It's even documented explicitly:

count ( * ) → bigint

Computes the number of input rows.


count ( "any" ) → bigint

Computes the number of input rows in which the input value is not null.

The gist of it: whenever you don't care whether an expression is NULL, use count(*) instead.

Related:

Some other RDBMS do not have the same fast path for count(*). OTOH, counting all rows in a table is comparatively slow in Postgres due to its MVCC model that forces checking row visibility. See:

Erwin Brandstetter
  • 175,982
  • 27
  • 439
  • 600