0

Does PostgreSQL support any kind of sparse index that would be useful for indexing only the first and last value in the table, based on a certain key?

Consider the following data:

CREATE TABLE samples (
   device_id int not null;
   ts timestamp not null;
   value real not null;
);

Now, let's assume I have millions of devices and each one can have billions of samples. I want to access all of this data - if access is slow I don't mind. However I am especially interested to access the first and the last value by ts timestamp: (device_id, value) very, fast. This to know what's the range of samples for a specific device.

I can build a trigger/application logic that manages a separate table where the information is stored:

CREATE TABLE first_last_samples (
   device_id int not null;
   first_ts timestamp not null;
   first_value real not null;
   last_ts timestamp not null;
   last_value real not null;
);

However, managing the logic for caching first/last item yourself feels a bit clunky. I am asking would PostgreSQL manage any special index types, maybe by extensions, which would let me solve this problem on the table/view/index definition level?

Because there are billions of data points, putting all of them in the index does not seem to make sense if you are only interested in first and last.

Erwin Brandstetter
  • 175,982
  • 27
  • 439
  • 600
Mikko Ohtamaa
  • 299
  • 1
  • 13
  • 1
    Have you tried a regular B-Tree index on (device_id, ts) to support a quick lookup for the latest and earliest row when you have a device_id. I would always start with a B-Tree, there’s extra complications to consider when using sparse ones – Andrew Sayer May 29 '22 at 18:43
  • I am already using B-tree, but because of the amount of data I feel I am wasting quite a lot of disk space. – Mikko Ohtamaa May 29 '22 at 20:28
  • 1
    Would you need the index to have real-time updates (immediately when a change is made) or can there be refresh latency? If the latter then a materialized view might be an option – Charlieface May 29 '22 at 21:35
  • Real-time is not needed, as long as there is some way to ensure that the data is correct with the future queries. – Mikko Ohtamaa May 29 '22 at 22:08
  • 1
    Describe your typical / possible access pattern. And your Postgres version. And be honest about your cardinalities: "millions of devices with billions or rows" would be quadrillions, which is simply too much. There are no "sparse" indexes in Postgres. There are BRIN indexes: much smaller, but not as fast as B-tree. See: https://dba.stackexchange.com/a/312327/3684 – Erwin Brandstetter May 29 '22 at 23:42
  • Thank you Erwin - using BRIN is a good idea. Let me take a look on that. – Mikko Ohtamaa May 30 '22 at 08:54

1 Answers1

1

If your undisclosed access patterns and/or other restrictions don't allow for a MATERIALIZED VIEW or a trigger solution keeping a table with min/max per device_id up to date, then the closest standard tool might be a BRIN index, which is much smaller than a corresponding B-tree index, typically by several orders of magnitude. But its efficiency also depends on undisclosed data distribution in your table. See:

Erwin Brandstetter
  • 175,982
  • 27
  • 439
  • 600