1

I need to add a new BIGSERIAL column to a huge table (~3 billion records). This question is similar to what I need to do and the accepted answer has helped me somewhat. But I'm still wondering about something. In my case, the table already has a BIGSERIAL column which is the primary key, but many rows have been deleted so now there are gaps. (The table has subsequently been fully vacuumed.) I need to regenerate the values so that they are sequential again. Here are 5 example rows of what I want to achieve where the new_value > 1000:

+---------+---------+
|old_value|new_value|
+---------+---------+
|1026     |1001     |
|1027     |1002     |
|1030     |1003     |
|1032     |1004     |
|1039     |1005     |
+---------+---------+

I have successfully implemented the alternative approach as mentioned in the referenced answer above (CREATE TABLE ... and then INSERT INTO new_table SELECT * FROM ... ), but I would also like to attempt, and benchmark against, the initial suggestion. The problem, however, is that I don't know whether the new_value will be generated in the same order as the old_value as this is a requirement.

How can I ensure the order of the new_value column follows/tracks the order of the old_value column when the new_value column is added using a statement like this:

ALTER TABLE existing_table ADD COLUMN new_value BIGSERIAL;

A different approach

I also attempted the following (that works quite well on a small table), but it's much slower than the alternative suggestion of the referenced answer on very large tables:

ALTER TABLE existing_table ADD COLUMN new_value BIGINT NOT NULL DEFAULT 0;

UPDATE existing_table AS existing SET new_value = generated.new_id FROM ( SELECT original.old_value , row_number() OVER (ORDER BY original.old_value) AS new_id FROM existing_table AS original ) generated WHERE existing.old_value = generated.old_value;

Erwin Brandstetter
  • 175,982
  • 27
  • 439
  • 600
HeatZync
  • 25
  • 6
  • 1
    What is your reason for this undertaking? It seems pointless to me. – Laurenz Albe Apr 13 '22 at 07:10
  • As a side note (won't make a difference for your underlying "problem"): the use of the "serial" types is discouraged in favor of the standard compliant identity columns. But I do agree with Laurenz: this seems rather futile. Gaps are not a problem. The only job of a generated primary key is to be unique. It is completely irrelevant if there are gaps or what the actual value is. –  Apr 13 '22 at 07:13
  • More background: The table is the domainevententry table used in Axon and it has grown too large. We found that many events/records that are stored are not required (if we make some changes to our code). We are therefore busy with a clean-up project. Axon makes use of tracking tokens which can handle gaps, but it adds additional/unnecessary load on the DB to keep track of the gaps, so it's best if the values follow nice and sequential on each other. – HeatZync Apr 13 '22 at 08:30
  • How do you plan to update all tables referencing the primary key of this table? Those values need to be changed in all other places as well. –  Apr 13 '22 at 10:03
  • Other tables don't reference the primary key column – HeatZync Apr 13 '22 at 12:33
  • so it's not needed as a primary key column, it's more of a sort-order column. – Jasen Apr 14 '22 at 01:44
  • Yes @Jasen. Preserving the order is important for the Axon framework. There is also another unique index over two separate columns: (aggregateidentifier, sequencenumber), but this index is only created after the clean-up/rebuild of the table is done. – HeatZync Apr 14 '22 at 07:12
  • Hmm, if that value is used for sorting, then why do the gaps matter? They won't change the sort order. –  Apr 14 '22 at 09:44
  • @a_horse_with_no_name: We make use of Axon's GapAwareTrackingToken. When there are gaps then Axon could fire queries containing WHERE globalindex IN ( ... list of gaps ... ). When there are no gaps then the query doesn't use WHERE globalindex IN. This might not be a major issue for the live system, because it only cares about new rows that are inserted, but there is also the notion of a replay that's used in other circumstances which starts at the first row and runs through the entire table. – HeatZync Apr 14 '22 at 10:37

1 Answers1

1

First, consider the advise in the comments: do you really need to remove gaps? Typically, you don't. And gaps will be creeping back in. See:

Next, consider the updates to my old answer you have been working off:

Among other things, how to possibly avoid writing WAL for the table data in the new table - that will be the most significant bit to further improve performance.

How can I ensure the order of the new_value column follows/tracks the order of the old_value column when the new_value column is added using a statement like this:

ALTER TABLE existing_table ADD COLUMN new_value BIGSERIAL;

You cannot. Use one of the other routes.

I also attempted the following (that works quite well on a small table), but it's much slower than the alternative suggestion of the referenced answer on very large tables:

The UPDATE route is inherently more expensive than writing a pristine new table from scratch (adding indexes constraints afterwards).

Erwin Brandstetter
  • 175,982
  • 27
  • 439
  • 600
  • Thanks @Erwin. You are correct, gaps do occur sometimes, but they are usually small and Axon is better at handling small gaps. The clean-up however will get rid of ~90% of the rows which will leave many, large gaps. – HeatZync Apr 14 '22 at 07:26