6

I have a unique constraint on a character varying column that will mostly have lengths of approximately 600 but could get to 10 million.

I have read about B-Trees, but I can't determine how much disk space the index will consume or if such large-sized values will prevent the index from functioning performantly.

What are the impacts upon disk space and time of a B-Tree index on character varying columns with such potentially large lengths?

Erwin Brandstetter
  • 175,982
  • 27
  • 439
  • 600

1 Answers1

11

In Postgres, a unique constraint is implemented with a unique B-tree index. The manual:

Adding a unique constraint will automatically create a unique B-tree index on the column or group of columns used in the constraint.

Indexes use the same basic storing mechanisms as tables: arrays of data pages. Indexes have some minor additional features. The manual:

In a table, an item is a row; in an index, an item is an index entry.

Every table and index is stored as an array of pages of a fixed size (usually 8 kB, although a different page size can be selected when compiling the server).

The maximum size of an index entry is roughly a third of a data page. See:

The maximum size was reduced to 2704 bytes in Postgres 12. Still the same in Postgres 15. Meaning your UNIQUE constraint is not possible.

Consider a unique index on hash value, either after adding a redundant column with the hash, or just an index on the hash expression. You could use the built-in function md5() or something more efficient for huge values. Related:

Erwin Brandstetter
  • 175,982
  • 27
  • 439
  • 600
  • Thank you Erwin Brandstetter, incredible advice as usual! For performance, should I use bytea? If so, should I let Postgres do the conversions with md5 and decode or sha512() with this library http://www.zedwood.com/article/cpp-sha512-function? I'm fairly certain I'll never need to extract the hash, but I will need to compare with external data. Thank you so very much for sharing your great knowledge! –  Jun 27 '14 at 01:04
  • @Gracchus: I suggest you put that in new questions (one topic per piece). Comments are not the place. You can always link to this one for reference. – Erwin Brandstetter Jun 27 '14 at 01:16
  • Thank you Erwin Brandstetter! I have posted per your recommendation: http://dba.stackexchange.com/questions/69170/optimal-hash-technique-to-index-large-text –  Jun 27 '14 at 01:45
  • 3
    I wonder if something has changed in newer Postgres version, as I can insert text values with 10000 (random) characters into a column using a unique index: http://rextester.com/ZDYS66763 –  Sep 05 '18 at 12:41
  • 2
    @a_horse_with_no_name: Interesting, indeed. Worthy of a new question: https://dba.stackexchange.com/q/217087/3684 (I think I found the answer.) – Erwin Brandstetter Sep 08 '18 at 16:22
  • What about using hash indexes (on Postgresql 14)? Is that a better solution than using b-trees to index the md5 of the column? – Gili Oct 06 '22 at 07:12