My understanding is that you want to decide on what to include in your index based what will trigger the index being used.
The following considerations are what I am aware of, in order of relevance:
- (Most common) Any columns used with the following query clauses:
WHERE,ORDER BY,JOIN - Index-only scans: getting data from a small selection of columns included in a covering index. In this case, data from an index could be used from a column just included in the
SELECTclause, but the query will likely still contain a column in aWHEREclause that should also be indexed. See PostgreSQL's implementation for more details. - Getting a full count: something like
SELECT count(*) FROM tbl;(It's not clear to me why an index would be used for this one) - For the checking of Foreign Key constraints: for verifying a record exists before allowing it to be added as a Foreign Key, or for deleting/setting to null/etc. records in tables
WHERE colx_withFK = _idif a record with id_idis deleted in the source table, etc. PostgreSQL auto-creates indices for FKs, but other RDBMSs (like SQL Server) might not, so this one might not be something a developer has to actually worry about.
And for all of these considerations, of course, anything you execute often is more important than a less used query.
Question: Did I get something wrong? Is there anything else?
As details can be very different between SQL implementations, answers can focus on PostgreSQL.
GROUP BYcan also benefit from ordered indexes. – Dan Guzman Aug 16 '23 at 18:10