14

I saw in the documentation the difference between count(*) and count(pk). I had been using count(pk) (where pk is a SERIAL PRIMARY KEY) not knowing about the existence of count(*).

My question is about Postgres' internal optimizations. Is it smart enough to pick up that a SERIAL PRIMARY KEY is going to exist in every row and never be false and just count rows or will it do redundant predicate checks for each row? I agree that this is probably too much of a pointless optimization but I'm just curious.

I took a look at the output of EXPLAIN and EXPLAIN VERBOSE for count(*), count(id) and count(id > 50) to see if EXPLAIN mentioned checking the predicates in its output. It doesn't.

Erwin Brandstetter
  • 175,982
  • 27
  • 439
  • 600
ldrg
  • 679
  • 2
  • 8
  • 15

1 Answers1

15

I got consistent results in my repeated tests with various versions over the last years:
count(*) is faster than count(pk). It is also shorter and most of the time it better fits what is tested: the existence of a row.

Concerning:

Is Postgres smart enough to pick up that a SERIAL PRIMARY KEY is going to exist in every row and never be false

The only relevant thing is the NOT NULL constraint. A PRIMARY KEY column is NOT NULL automatically. "serial" or "never false" are orthogonal to the question.

With count(col), if PostgreSQL was trying to be smart and check the system catalog whether a column was NOT NULL and fall back to an equivalent count(*), you'd still have one more look-up on a system table than with count(*).

What's more, any column can contain NULL values after an OUTER JOIN. So the optimization not applicable to certain queries. Probably not worth the complication ...

As for EXPLAIN output, there is a hint:

EXPLAIN SELECT count(*) FROM ...

Aggregate (cost=4963.38..4963.43 rows=1 width=0) ...

EXPLAIN SELECT count(pk) FROM ...

Aggregate (cost=4963.38..4963.43 rows=1 width=4) ...

Meaning, count(col) is not converted to count(*), even if it's defined NOT NULL.

Erwin Brandstetter
  • 175,982
  • 27
  • 439
  • 600
  • Is this still the case with the new versions? I think it wouldn't really need a lookup for every query - it could be cached. – Ondra Žižka Apr 26 '18 at 13:01
  • 2
    Btw, with a NOT NULL column, the difference is big if you have a lot of rows. In our case with millions of rows, COUNT(*) is 3 times faster. (Postgres 9.4) – Ondra Žižka Apr 26 '18 at 13:03