2

I have a lot of boolean attribute for entities in a table and am in need to store even more of them. Currently I have 17 such attributes (Other fields are four bigint fields). I am in need of adding about dozen or so more attributes.

Should I just go ahead and create boolean columns for each of the required attributes or are there better ways? How would that impact storage?

These fields will be searched over on in multiple combinations (so multi-column indexes are not going to help).

Thought (premature) about saving storage: How about I use a 64-bit integer and use each bit as a boolean value? I can use the bit operators for matching but will such operation use indexes since scan is not based on ordering or equality of integers?

Vaibhav
  • 123
  • 5

1 Answers1

2

I wouldn't bother.

  1. Postgres has a record overhead of at least 23 bytes per record -- and that's not counting a bitmap to keep track of NULLs, or other block overheads which combined probably average another couple of bytes per record. See https://stackoverflow.com/a/26206031/391445
  2. Your bigint takes 8 bytes of storage, versus 17 for the booleans.
  3. Another option is to use bit strings, but these have an overhead of 5 or 8 bytes, so to store your 17 booleans you need 8 bytes anyway.
  4. Your 4 other bigint fields take another 32 bytes of storage.

So expect to save only around 10% -- less if you have more non-boolean columns, but only slightly more as you add more boolean columns -- at the expense of making your code and data much more difficult to understand.

But if you have a lot of data (hundreds of millions of records) I would do some performance tests to compare.

Colin 't Hart
  • 9,323
  • 15
  • 35
  • 43
  • 1
    People on this site (DBA - Stackexchange) should refrain from referring to Erwin Brandsetter's answers. That gentleman has created kind of his own wiki interweaving his answers all over! That one link sent me down a rabbit-hole and I came out all confused, slightly enlightened and mind-blown. It took me about 4 hours, no less, to go through everything he had to say (and references to the PostgreSQL site and what not!).

    Thanks for the answer though (and many thanks for the link) - best 4 hours spent learning again the internals of PostgreSQL!

    – Vaibhav Jun 07 '20 at 07:11