1

The requirements go like this:

  • Each user has a wallet that can contain tens to a hundred or so different currencies, and for each currency an amount

  • Number of currency can grow > 1600 (make me anxious about column limit)

  • User base in the millions

  • A lot of reads for the whole wallet, a bit of write for a currency's amount

  • Nightly sum of total amount from all users for each currency

So far I am thinking of 2 options:

  • Each user has a {"currency":amount} JSONB field, or

  • A 500M+ rows table with user_id, currency_id, amount, clustered and use BRIN index on user_id

Which way should I go? Thank you for your advice.

Dennis Lau
  • 113
  • 3

1 Answers1

2

Assuming a mostly immutable set ~ 100 currencies overall (you haven't been clear on that), and your given requirements, consider the simple approach: 1 table with 1 row per user and 1 column per currency. Like:

CREATE TABLE wallet (
   user_id   integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY 
 , currency1 integer -- or numeric, depends on missing info
 , currency2 integer
 , ...
 , currency100 integer
);

This has a massively smaller disk footprint than either of your two options so far.

4 bytes per currency in use (with integer), plus 16 bytes for the NULL bitmap. NULL storage is very cheap. See:

Data type integer or numeric?

Your option 1 (jsonb) at least doubles the size per currency in use by storing a key name for every amount. Wins with only very few currencies per user, storage-wise. Sums, calculations, indexing are slower and more complicated. Data integrity is hard to enforce.

Your option 2 occupies ~ 44 bytes per currency (separate row). Very clean data model, flexible for adding / removing currencies on the fly, but wastes a lot of space, which makes everything slow.


  • A lot of reads for the whole wallet are as simple as:

    SELECT * FROM wallet WHERE user_id = 123;
    

    You only need an index on user_id, which is provided by the PK.

  • Getting the nightly sum of total amount from all users for each currency is as simple and as fast as can be:

    SELECT sum(currency1), sum(currency2), ... FROM wallet;
    

    No index for that.


If you have a couple of dozen currencies covering the lion's share of all entries, you could try a combined strategy: fixed columns for the regulars and a jsonb column for the rest. This combines minimum storage size with absolute flexibility - at the cost of more complicated queries and computations, as you have to combine both now. And much weaker means to enforce integrity.

CREATE TABLE wallet (
   user_id   integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY 
 , currency1 integer -- or numeric, depends on missing info
 , currency2 integer
 , ...
 , currency70 integer
 , chickenfeed jsonb
);

I chose 70 currency columns to stay below the local optimum of 72 columns, before another 8 bytes are allocated for the NULL bitmap. A minor consideration. Chose a number that fits your data distribution.

Maintain a table of all allowed currencies - you do not want to search millions of rows to get the complete list. And use minimum-length key names in the jsonb column, like '{"A1":123}' (2 bytes for the key) so not to waste GB of storage to repeating lengthy names over and over.

Erwin Brandstetter
  • 175,982
  • 27
  • 439
  • 600
  • Thanks for the response. Sorry I wasn't clear on that but yes, the number of currency can grow > 1600 and hit the column limit of postgres as per requirement which is why we can't go the obvious route – Dennis Lau Feb 11 '19 at 03:53
  • Much more than 100 columns would already be stretching it. I wouldn't go anywhere near the theoretical maximum of columns. I added a possible compromise above. – Erwin Brandstetter Feb 11 '19 at 13:05
  • @ErwinBrandstetter what about this idea: the common junction table (user_id, currency_code, amount) with PK on (user_id, currency_code) and another index on (currency_code) INCLUDE (amount) for the daily sums? – ypercubeᵀᴹ Feb 11 '19 at 13:13
  • @ypercubeᵀᴹ: clean and simple - basically the OP's option 2 with an optimized index. (While computing all sums the index won't be of help, though.) The only (considerable) downside: it occupies multiple times the storage which is a drag on performance as well - like discussed above. – Erwin Brandstetter Feb 11 '19 at 13:18
  • @ErwinBrandstetter thanks for the help! Can I get you a beer via PayPal or something – Dennis Lau Feb 12 '19 at 03:32
  • @DennisLau: No PayPal. Good fortune with your project! – Erwin Brandstetter Feb 17 '19 at 22:02