1

In a PostgreSQL DB, I have a sortable bigint primary key, which is a bit verbose in terms of readability.

I want to encode it as base36 or base64, like using char(n). In theory the number of bytes should be the same as before, so it does not take more space. Is that correct?

Are there other concerns? For example sorting, collation?

Andriy M
  • 22,983
  • 6
  • 59
  • 103
jack2684
  • 183
  • 1
  • 8
  • 1
    What do you mean by bigint being verbose? Why would a bigint PK need to be readable or non-verbose to begin with? And what do you mean by base36 or base64 having the same number of bytes? Could you give us an example of a bigint value and its base36 or base64 equivalent that you believe would have the same number of bytes? It doesn't seem very clear what you are looking to achieve, to be honest. Welcome to the site, though! – Andriy M Jan 03 '21 at 21:46
  • 1
    First: don't use char. Secondly: 9223372036854775807 stored as a bigint requires 8 byte, encoded into base64 it's OTIyMzM3MjAzNjg1NDc3NTgwNw== which requires 29 bytes of storage. So you will increase your storage requirements substantially. And all that converting back and forth for no apparent reason won't make things faster as well. What exactly is the problem you are trying to solve with that seemingly useless approach? –  Jan 03 '21 at 21:52
  • Thank you @AndriyM and @a_horse_with_no_name! I was thinking about changing the radix/base. By trying on this website, 9223372036854775807 -> 1Y2P0IJ32E8E7(36 base). The second one is shorter and takes less screen space. – jack2684 Jan 04 '21 at 05:52
  • 2
    The second one might be shorter on screen but uses more space when stored in the database. But in my experience, showing a generated (i.e. "artificial") key to an end user is a bad idea to begin with –  Jan 04 '21 at 12:26
  • Is there no natural key that the user can use to identify the row? – Lennart - Slava Ukraini Jan 04 '21 at 16:45
  • @Lennart in my use case, each row is a geography datapoint with some data. It seems like I can have a translation at application layer, use a few columns to generate a representation of the row. The downside is more hustle in the application layer. – jack2684 Jan 04 '21 at 17:47
  • If the geography point is unique that seems like a natural key. Can you not use that to identify the row? – Lennart - Slava Ukraini Jan 04 '21 at 17:51
  • @Lennart I see. One follow up, since geog is not sortable, I should probably create a hash index or gist index for it, instead of Btree. Is that right? – jack2684 Jan 04 '21 at 21:57
  • Im not sure I understand the problem, why cant you order by your bigint. Perhaps you should create a db<>fiddle and explain what it is you are trying to achieve? – Lennart - Slava Ukraini Jan 04 '21 at 22:34

2 Answers2

2

There are potential performance differences and sorting differences (collation dependent) per this DBA.StackExchange answer and it's linked answers such as this one. I highly recommend reading through all the information referenced in that answer.

These sections of the above linked articles are specifically relevant:

There is no performance difference among these three types [char, varchar, text]...

Short answer: integer is faster than varchar or text in every aspect.

Therefore integer (and BIGINT) is more performant of a data type, even of equal data length.

J.D.
  • 37,483
  • 8
  • 54
  • 121
  • If yo don’t have much index Scans and/of buffer cache is not an issue you might however not see a bug difference, with C collation it should also be tame on CPU, it’s best to measure for your specific workload (and only use it if you get a benefit like special global unique key formats). Visualization would not be my prio, portability might (uuid types might be tricky compared to char36). – eckes Jan 13 '21 at 09:07
0

You could keep the bigint PK, but for user interaction-related purposes (if that is why you want to reduce the length of the values) you could introduce a generated column that would return a string of characters representing your PK value in the desired form.

ALTER TABLE
  YourTable
ADD
  MyBigintIDInBaseN text
  GENERATED ALWAYS AS MyDecimalToBaseNConversionFunction(MyBigintID) STORED
;

The conversion function would need to be a custom function (because I do not believe PostgreSQL offers a built-in one for this task), marked as IMMUTABLE so that it can be used in a generated column expression.

That way you would continue using the bigint PK internally, including as a sorting criterion where necessary and as a reference target for other tables' FKs. For display purposes, however, you would use the computed column.

If your custom representations of PKs need to be used as query arguments as well, it would be a good idea to create an index on the computed column to help the performance.

Andriy M
  • 22,983
  • 6
  • 59
  • 103
  • Thank you @AndriyM! People have been saying that exposing PK to user end is not a good idea. However if this generated column is as good as the PK (sortable, uniquely identify a row), does exposing this still better than PK? – jack2684 Jan 04 '21 at 17:39
  • @jack2684: I guess I'm one of those saying that exposing PKs to the user is not a good idea, so it doesn't matter to me which way it's done as it would be bad either way. That said, if you still insist on doing this, then I guess first of all you need to define "better". In your question you are explaining that readability is a factor. I guess that means that presenting them in a more concise form is probably better than the traditional decimal representation. – Andriy M Jan 04 '21 at 18:22
  • that's very true! The term "better" is really determined case by case. – jack2684 Jan 04 '21 at 21:54