SELECT rows from table one by one, comparing one column values with values in an array

Question

I have an array of timestamps in a PLPGSQL code block: timestamp_array.

For each element of timestamp_array I want to get the element itself (timestamp_element) and some columns from a table (my_table) that also has a timestamp column, but I don't want to get the table timestamp column itself, but the timestamp array element. The condition is that I only want to get one table row for each array element, the nearest match (into the past) between the timestamp elements in the array and the table timestamp column.

What I want to achieve is something like:

FOR EACH timestamp_element IN ARRAY timestamp_array
LOOP
  EXECUTE $EXE$
    SELECT my_table.some_other_columns...
    FROM my_table
    WHERE my_table.timestamp <= $1
    ORDER BY my_table.timestamp DESC
    LIMIT 1
  $EXE$
  USING timestamp_element
  INTO tmp_array;
EXECUTE
    INSERT INTO temporary_table
    VALUES ($1, array_to_string($2, ', '))
  USING timestamp_element, tmp_array;
END LOOP;

I don't know if the previous code is 100% correct, but it's only to try to explain better what I want to achieve. And obviously, the point would be to do the job with only one query, instead that one query for array element.

To put it as a practical case if I have a my_table:

timestamp	col_a	...
2020-02-13 23:12:07	12	...
2020-03-27 10:37:01	15	...
2020-06-14 16:32:44	7	...
2020-06-14 17:01:57	33	...

With an timestamp_array:

[
  2020-02-15 12:00:00,
  2020-03-22 00:00:00,
  2020-06-14 17:00:00
]

I would want the following temporary_table without using one independent query for each array element:

timestamp	col_a	...
2020-02-15 12:00:00	12	...
2020-03-22 00:00:00	12	...
2020-06-14 17:00:00	7	...

What if there are no timestamps in the table that fall between two in your array? — Gerard H. Pille, Sep 24 '21 at 16:21
Maybe I have not expressed well my self, but the query should search for the biggest avaliable table timestamp which is smaller or equal than the timestamp array element. The biggest table timestamp that meets table.timestamp <= array.element_timestamp criteria. So it hasn't to fall between. A timestamp element in the array like '2034-01-01' should return 7. A timestamp of '1980-01-01' should return null. — Héctor, Sep 24 '21 at 16:29

Erwin Brandstetter · Accepted Answer · 2021-09-25T01:03:23.063

1

unnest() the array and run a LATERAL subquery on your table:

SELECT a.timestamp_element, t.col1, t.col2, ...
FROM   unnest(timestamp_array) AS a(timestamp_element)
LEFT   JOIN LATERAL (
   SELECT *
   FROM   my_table t
   WHERE  t.timestamp <= a.timestamp_element
   ORDER  BY t.timestamp DESC
   LIMIT  1
   ) t ON true;

Pure SQL. You can nest it in a PL/pgSQL block, of course.
But you certainly don't need dynamic SQL with EXECUTE for this.

Maybe add another ORDER BY expression to break ties and get deterministic results if my_table.timestamp isn't UNIQUE.

If my_table is big, be sure to have an index on (timestamp) to make it fast.

LEFT JOIN .. ON true keeps all timestamp_element in the result, even if no earlier row is found in my_table.

See:

edited Sep 25 '21 at 01:03

answered Sep 25 '21 at 00:58

Erwin Brandstetter

175,982
27
439
600

Will an index help when searching for a value lower than or equal? – Gerard H. Pille Sep 25 '21 at 16:48
@GerardH.Pille: Yes, in this constellation a plain btree index is perfect. – Erwin Brandstetter Sep 25 '21 at 18:13
Thanks, it looks pretty much what I was really looking for. Tomorrow I will test it. But I'm a little bit confused about the dynSQL/execute part. Isn't that the tool to use when the full SQL statement is unknown until runtime, like in this case that the timestamp array is unkown? When dynSQL/execute is really needed then? – Héctor Sep 26 '21 at 08:49
@Héctor: Dynamic SQL is really needed when syntax elements or identifiers are dynamic. Your array is a value. You could still use dynamic SQL to force a custom execution plan for every array element. But that only pays for extremely uneven data distributions. – Erwin Brandstetter Sep 26 '21 at 12:38
Ah, ok. So, no need to use EXECUTE with 'dynamic' literals. And what if the literal comes directly from user input and it's unsafe? Would be preferable to use user_inp=quote_literal(user_inp) and then use it directly in plain SQL, or to use EXECUTE ... USING user_inp? I have read that EXECUTE...USING was newer and more peformant than quote_literal(), that was a little bit obsolette. – Héctor Sep 26 '21 at 19:01
@Héctor: I think this related answer covers everything: https://dba.stackexchange.com/a/49718/3684 For anything still unclear, please start another question. Comments are not the place. – Erwin Brandstetter Sep 26 '21 at 23:51
OK, about the code I have finished the function and it works like charm, thanks you so much. About the question, I have read it before, but seems that I had some missunderstoods the first time. About the comments we can delete them if they are out of place. – Héctor Sep 28 '21 at 09:05

SELECT rows from table one by one, comparing one column values with values in an array

1 Answers1