6

I have some tables in my Postgres database, and I have set the tables to cluster about an index. But I forget when was the last time I ran cluster maintenance on these tables, and I do not want to waste my time to run cluster on them. How do I know if the tables are clustered?

David S.
  • 926

3 Answers3

4

You can query all the clustered tables from the data dictionary:

SELECT relname AS table_name
FROM   pg_class c
JOIN   pg_index i ON i.indrelid = c.oid
WHERE  relkind = 'r' AND relhasindex AND i.indisclustered
Mureinik
  • 4,024
  • 1
    This doesn't seem to include primary keys that are clustered? – jontejj Feb 24 '15 at 09:24
  • 1
    And it doesn't say which index was used for the clustering, is there a way to add that? – chrismarx Mar 29 '17 at 15:01
  • 1
    As best I can tell this answer is very wrong in the context of the question. It only queries what indexes are used for clustering - it gives no information on when or even if such clustering was performed. The better answer is to look at stats as given in: https://stackoverflow.com/questions/53299733/how-to-tell-when-a-postgres-table-was-clustered-and-what-indexes-were-used – Alex Stoddard May 07 '20 at 15:16
  • Based on the comments, which I have not verified, it seems this answer does not apply to all situations. So I removed the answer mark. – David S. May 07 '20 at 23:25
2

The accepted answer appears to be incorrect.

It seems that when (or even if) clustering was performed is not available from the query but stats can be queried to see how good clustering is: https://stackoverflow.com/questions/53299733/how-to-tell-when-a-postgres-table-was-clustered-and-what-indexes-were-used

1

Building off @alex-stoddard's answer and the StackOverflow question he linked, I was able to construct this query, which seems to show all index-table pairs used for clustering:

SELECT
  indisclustered,
  index_name,
  cls.relname AS table_name
FROM 
  pg_class cls
  INNER JOIN (
    SELECT
      indexrelid,
      indrelid,
      indisclustered,
      relname AS index_name
    FROM pg_index ndx
    INNER JOIN pg_class cls
    ON cls.oid=indexrelid
    WHERE indisclustered='t'
  ) tmp
  ON cls.oid=tmp.indrelid
;

Note that clustering is per-index, not per-column (although indices are often based on columns), but that performance stats are collected by column. If you also want to get some per-column stats, you can try:

SELECT
tablename,
attname,
correlation,
index_name,
indexdef
FROM (
SELECT 
    indisclustered,
    index_name,
    table_name,
    indexdef
FROM (
SELECT
indisclustered,
index_name,
cls.relname AS table_name
FROM 
pg_class cls
    INNER JOIN (
        SELECT
            indexrelid,
            indrelid,
            indisclustered,
            relname AS index_name
        FROM pg_index ndx
        INNER JOIN pg_class cls
        ON cls.oid=indexrelid
        WHERE indisclustered='t'
    ) tmp
    ON cls.oid=tmp.indrelid
) clustering
INNER JOIN pg_indexes indxs
ON indxs.tablename=clustering.table_name AND 
indxs.indexname=clustering.index_name
) indexes
INNER JOIN
pg_stats
ON indexes.table_name=pg_stats.tablename AND indexes.indexdef LIKE  CONCAT('%', pg_stats.attname, '%')
;