Is a clustered index on column A the same as creating a table ordered by A?

Question

Reading the official PostgreSQL documentation for version 9.0 I read an interesting escamotage that performs better than CLUSTER for big tables:

The CLUSTER command reorders the original table by scanning it using the index you specify. This can be slow on large tables because the rows are fetched from the table in index order, and if the table is disordered, the entries are on random pages, so there is one disk page retrieved for every row moved. (PostgreSQL has a cache, but the majority of a big table will not fit in the cache.) The other way to cluster a table is to use:
  CREATE TABLE newtable AS
    SELECT * FROM table ORDER BY columnlist;
which uses the PostgreSQL sorting code to produce the desired order; this is usually much faster than an index scan for disordered data. Then you drop the old table, use ALTER TABLE ... RENAME to rename newtable to the old name, and recreate the table's indexes. The big disadvantage of this approach is that it does not preserve OIDs, constraints, foreign key relationships, granted privileges, and other ancillary properties of the table — all such items must be manually recreated. Another disadvantage is that this way requires a sort temporary file about the same size as the table itself, so peak disk usage is about three times the table size instead of twice the table size.

The problem is that this suggestion doesn't appear in > 9.0 versions of the official documentation.

My question is if this escamotage is still valid for 9.1, 9.2, 9.3 and 9.4 because I'm stuck with a CLUSTER operation over two big tables (one has ~750M rows and the other one has ~1650M rows) and average disk write/read speed is 3MB/s due to the CLUSTER algorithm explained in the official doc. It's a slow process over big tables, so I'd like to avoid it doing the "create ordered table over index-associated-column" trick. This will save me days of DB processing.

score 4 · Accepted Answer · edited Apr 13 '17 at 12:43

4

Like @dezso commented, creating a new table and dropping the old used to be faster in old versions, but not any more with the new implementation in pg 9.1.

The most common problem with CLUSTER is that it requires an exclusive lock on the table, which does not go well with concurrent access to it.

The solution to this problem is pg_repack, which does not lock the table exclusively.

Generally, make sure that your server configuration is fit for the task. High settings (a lot of RAM) for maintenance_work_mem would help both CLUSTER and CREATE INDEX on big tables. Standard setting is way too small for you. Follow the links for details.

You might set it very high temporarily for a transaction with SET LOCAL and leave it at a reasonable setting otherwise:

BEGIN;
SET LOCAL maintenance_work_mem = ????MB; -- find the sweet spot
CLUSTER tbl;
COMMIT;

If possible, set it high enough to fit the whole operation in RAM.

More:

edited Apr 13 '17 at 12:43

Community

1

answered Feb 27 '15 at 22:58

Erwin Brandstetter

175,982
27
439
600

Thanks for your response. At link there's the following notice: Target table must have a PRIMARY KEY, or at least a UNIQUE total index on a NOT NULL column.. My tables dont' satisfy this statement. Do you think that partitioning the tables will improve CREATE INDEX and CLUSTER performance? – pietrop Feb 28 '15 at 13:23
@pietrop: Since your tables are so big, partitioning would probably help quite a bit for CLUSTER and CREATE INDEX, smaller partitions can more easily be processed in RAM. But first check if optimizing your settings might do the job. I added some more to the answer. – Erwin Brandstetter Feb 28 '15 at 17:30
I had yet set maintenance_work_mem = 4GB on a 8GB RAM machine and maintenance_work_mem = 64GB on a 128GB RAM machine, but I didn't notice any improvement. The bottleneck was the disk I/O because PostgreSQL was reading a lot of 8KB blocks instead of bigger ones, resulting in a low read speed (~3MB/S). Now I'm repopulating the DB using the partitioning method. As a reference point, it tooks a total time of 40seconds on a 3milion rows table to: 9xCREATE...INHERITS, 9xINSERT, 10xVACUUM ANALYZE, 1xINDEX, 1xCLUSTER, 3xINDEX, 10xVACUUM ANALYZE. I'm expecting 6 hours for 1650mln rows – pietrop Mar 02 '15 at 11:09

Is a clustered index on column A the same as creating a table ordered by A?

1 Answers1

Linked