-2

My understanding is that you want to decide on what to include in your index based what will trigger the index being used.

The following considerations are what I am aware of, in order of relevance:

  • (Most common) Any columns used with the following query clauses: WHERE, ORDER BY, JOIN
  • Index-only scans: getting data from a small selection of columns included in a covering index. In this case, data from an index could be used from a column just included in the SELECT clause, but the query will likely still contain a column in a WHERE clause that should also be indexed. See PostgreSQL's implementation for more details.
  • Getting a full count: something like SELECT count(*) FROM tbl; (It's not clear to me why an index would be used for this one)
  • For the checking of Foreign Key constraints: for verifying a record exists before allowing it to be added as a Foreign Key, or for deleting/setting to null/etc. records in tables WHERE colx_withFK = _id if a record with id _id is deleted in the source table, etc. PostgreSQL auto-creates indices for FKs, but other RDBMSs (like SQL Server) might not, so this one might not be something a developer has to actually worry about.

And for all of these considerations, of course, anything you execute often is more important than a less used query.

Question: Did I get something wrong? Is there anything else?


As details can be very different between SQL implementations, answers can focus on PostgreSQL.

Sources: helpful article, SE DBA answer and comments

Akaisteph7
  • 97
  • 3
  • 2
    "All RDBMSs using SQL should auto-create indices for FKs so this one shouldn't be something a developer has to actually worry about." - Thank goodness SQL Server doesn't do that behind your back. As with nearly all things DBMS related, "it depends". GROUP BY can also benefit from ordered indexes. – Dan Guzman Aug 16 '23 at 18:10
  • "(It's not clear to me why an index would be used for this one)" One line above you mentioned Index-only scans – SergeyA Aug 16 '23 at 19:08
  • @SergeyA Why would that be faster than a normal table scan? – Akaisteph7 Aug 16 '23 at 20:42
  • @Akaisteph7 Because the index usually is smaller than a table – SergeyA Aug 16 '23 at 22:02
  • @SergeyA Idk what you mean by that. It seems in the case of a full count the number of rows to traverse for a table scan should be even less than the number of nodes to traverse for an index scan. I guess maybe you're trying to say its faster to traverse the index even in this case because of how its structured and stored maybe. But how would you know this? – Akaisteph7 Aug 17 '23 at 13:23
  • SergeyA means - a table can be for ex 10GB in size on disk. One-column Index to this table could be 1GB. It is faster to read 1GB than 10GB – Eduard Okhvat Aug 18 '23 at 02:21
  • The above makes sense. But I still would like some reference links or something other than just a comment to accept this is why. I've seen nothing about this. – Akaisteph7 Aug 18 '23 at 14:49
  • Meanwhile I see its the usual StackExchange experience with people downvoting without giving any reasons. I've restricted my scope to just Postgres, so "it depends" is not, and was not anyways, a valid answer. – Akaisteph7 Aug 18 '23 at 14:51
  • It seems there's a very helpful guide for SQL indexing considerations (Use the Index, Luke) that would be worth looking into further. – Akaisteph7 Aug 18 '23 at 14:59

1 Answers1

0

There are a few things you forgot:

  • creating an index on an expression makes PostgreSQL collect statistics for the expression (from v15 on, you can alternatively use extended statistics for that purpose)

  • unique indexes can implement constraints

There are also partial indexes, where part of the WHERE condition can be implemented as a WHERE clause for the index.

But I think that you won't be able to come up with a useful list that "semi-automatically" enables you to index a database well without further considerations, like

  • Which queries are executed how often, and how important is their performance?

  • Is the benefit in query speed worth the performance loss in data modifications?

  • Shall I add two specific indexes that ideally support two queries, or is it better to add a single index that is not perfect for either query, but supports both somewhat?

Laurenz Albe
  • 51,298
  • 4
  • 39
  • 69