What considerations should be made when deciding on indexes for PostgreSQL databases?

Question

My understanding is that you want to decide on what to include in your index based what will trigger the index being used.

The following considerations are what I am aware of, in order of relevance:

(Most common) Any columns used with the following query clauses: WHERE, ORDER BY, JOIN
Index-only scans: getting data from a small selection of columns included in a covering index. In this case, data from an index could be used from a column just included in the SELECT clause, but the query will likely still contain a column in a WHERE clause that should also be indexed. See PostgreSQL's implementation for more details.
Getting a full count: something like SELECT count(*) FROM tbl; (It's not clear to me why an index would be used for this one)
For the checking of Foreign Key constraints: for verifying a record exists before allowing it to be added as a Foreign Key, or for deleting/setting to null/etc. records in tables WHERE colx_withFK = _id if a record with id _id is deleted in the source table, etc. PostgreSQL auto-creates indices for FKs, but other RDBMSs (like SQL Server) might not, so this one might not be something a developer has to actually worry about.

And for all of these considerations, of course, anything you execute often is more important than a less used query.

Question: Did I get something wrong? Is there anything else?

As details can be very different between SQL implementations, answers can focus on PostgreSQL.

Sources: helpful article, SE DBA answer and comments

"All RDBMSs using SQL should auto-create indices for FKs so this one shouldn't be something a developer has to actually worry about." - Thank goodness SQL Server doesn't do that behind your back. As with nearly all things DBMS related, "it depends". GROUP BY can also benefit from ordered indexes. — Dan Guzman, Aug 16 '23 at 18:10
"(It's not clear to me why an index would be used for this one)" One line above you mentioned Index-only scans — SergeyA, Aug 16 '23 at 19:08
@Akaisteph7 Because the index usually is smaller than a table — SergeyA, Aug 16 '23 at 22:02
@SergeyA Idk what you mean by that. It seems in the case of a full count the number of rows to traverse for a table scan should be even less than the number of nodes to traverse for an index scan. I guess maybe you're trying to say its faster to traverse the index even in this case because of how its structured and stored maybe. But how would you know this? — Akaisteph7, Aug 17 '23 at 13:23
SergeyA means - a table can be for ex 10GB in size on disk. One-column Index to this table could be 1GB. It is faster to read 1GB than 10GB — Eduard Okhvat, Aug 18 '23 at 02:21
The above makes sense. But I still would like some reference links or something other than just a comment to accept this is why. I've seen nothing about this. — Akaisteph7, Aug 18 '23 at 14:49
Meanwhile I see its the usual StackExchange experience with people downvoting without giving any reasons. I've restricted my scope to just Postgres, so "it depends" is not, and was not anyways, a valid answer. — Akaisteph7, Aug 18 '23 at 14:51
It seems there's a very helpful guide for SQL indexing considerations (Use the Index, Luke) that would be worth looking into further. — Akaisteph7, Aug 18 '23 at 14:59

score 0 · Answer 1 · answered Aug 28 '23 at 07:28

There are a few things you forgot:

creating an index on an expression makes PostgreSQL collect statistics for the expression (from v15 on, you can alternatively use extended statistics for that purpose)
unique indexes can implement constraints

There are also partial indexes, where part of the WHERE condition can be implemented as a WHERE clause for the index.

But I think that you won't be able to come up with a useful list that "semi-automatically" enables you to index a database well without further considerations, like

Which queries are executed how often, and how important is their performance?
Is the benefit in query speed worth the performance loss in data modifications?
Shall I add two specific indexes that ideally support two queries, or is it better to add a single index that is not perfect for either query, but supports both somewhat?

What considerations should be made when deciding on indexes for PostgreSQL databases?

1 Answers1