UPDATE with LIMIT / OFFSET in PL/pgSQL function

Question

In my Postgres 9.2 database I've got the following SELECT, which will be used by a PL/pgSQL function:

SELECT
      id
FROM
      tablea
WHERE
      migrated = false;

I want to get 2000 records each time and perform something like the following:

CREATE or REPLACE FUNCTION migrate_data()
RETURNS integer AS
$$
declare
        row record;

BEGIN

FOR row IN EXECUTE '
        SELECT
              id
        FROM
              tablea
        WHERE
              migrated = false
'
LOOP

INSERT INTO tableb (id)
VALUES (row.id);

UPDATE tablea a SET migrated = yes WHERE a.id = row.id;

END LOOP;

RETURN num_rows; -- I want it to return the number of processed rows

END

$$ language 'plpgsql';

How can I tell the function to process 2000 records with every call? And it must start from where it left when on the next call.

Example:

select migrate_data(); -- id 0 to id 2000

select migrate_data(); -- id 2001 to id 4000

select migrate_data(); -- id 4001 to id 6000

select migrate_data(); -- id 6001 to id 8000

Etc.

BTW found this: Five ways to paginate in Postgres, from the basic to the exotic — Abelisto, Jun 08 '17 at 23:30
Is there concurrent write access to the source table? If yes, things get more sophisticated. — Erwin Brandstetter, Jul 09 '17 at 14:34

score 3 · Answer 1 · answered Jun 08 '17 at 23:11

3

It will be much simpler (and faster) to use cursors for that:

begin;
declare foo cursor for
    SELECT
          id
    FROM
          tablea
    WHERE
          mig = true;
fetch forward 2000 from foo;
fetch forward 2000 from foo;
-- etc
rollback;

About declare
About fetch

answered Jun 08 '17 at 23:11

Abelisto

1,549
1
9
13

So... on your example, because there is 2 fetch forward 2000 from foo; calls the function will execute 4000 rows.. Did I understand right? – Patrick B. Jun 08 '17 at 23:19
@PatrickB. There is no function. But yes, in my example there are 4000 rows will be fetched (or less if there are no so many rows). – Abelisto Jun 08 '17 at 23:27
Thanks but I'd rather prefer doing this using pl/pgsql as I'm more familiar... – Patrick B. Jun 08 '17 at 23:45

Erwin Brandstetter · Accepted Answer · 2021-07-30T01:59:27.190

You can run a single SQL command with data-modifying CTEs. Then get the row count with GET DIAGNOSTICS in a plpgsql function:

CREATE OR REPLACE FUNCTION migrate_data(OUT row_ct int)
  LANGUAGE plpgsql AS
$func$
BEGIN
   WITH sel AS (
      SELECT id
      FROM   tablea
      WHERE  migrated = false
      ORDER  BY id   -- to proceed in deterministic order
      LIMIT  2000
      FOR    UPDATE  -- only for concurrent write access
      )
    , upd AS (
      UPDATE tablea a 
      SET    migrated = TRUE
      FROM   cte
      WHERE  a.id = sel.id
      )
   INSERT INTO tableb(id)
   TABLE  sel;  -- shorthand for: SELECT * FROM sel
GET DIAGNOSTICS row_ct = ROW_COUNT;  -- writes to OUT param directly
END
$func$;

UPDATE with LIMIT / OFFSET in PL/pgSQL function

2 Answers2

Linked