2

In my Postgres 9.2 database I've got the following SELECT, which will be used by a PL/pgSQL function:

SELECT
      id
FROM
      tablea
WHERE
      migrated = false;

I want to get 2000 records each time and perform something like the following:

CREATE or REPLACE FUNCTION migrate_data()
RETURNS integer AS
$$
declare
        row record;

BEGIN

FOR row IN EXECUTE '
        SELECT
              id
        FROM
              tablea
        WHERE
              migrated = false
'
LOOP

INSERT INTO tableb (id)
VALUES (row.id);

UPDATE tablea a SET migrated = yes WHERE a.id = row.id;

END LOOP;

RETURN num_rows; -- I want it to return the number of processed rows

END

$$ language 'plpgsql';

How can I tell the function to process 2000 records with every call? And it must start from where it left when on the next call.

Example:

select migrate_data(); -- id 0 to id 2000

select migrate_data(); -- id 2001 to id 4000

select migrate_data(); -- id 4001 to id 6000

select migrate_data(); -- id 6001 to id 8000

Etc.

Erwin Brandstetter
  • 175,982
  • 27
  • 439
  • 600
Patrick B.
  • 191
  • 4
  • 12

2 Answers2

3

It will be much simpler (and faster) to use cursors for that:

begin;
declare foo cursor for
    SELECT
          id
    FROM
          tablea
    WHERE
          mig = true;
fetch forward 2000 from foo;
fetch forward 2000 from foo;
-- etc
rollback;

About declare
About fetch

Abelisto
  • 1,549
  • 1
  • 9
  • 13
  • So... on your example, because there is 2 fetch forward 2000 from foo; calls the function will execute 4000 rows.. Did I understand right? – Patrick B. Jun 08 '17 at 23:19
  • @PatrickB. There is no function. But yes, in my example there are 4000 rows will be fetched (or less if there are no so many rows). – Abelisto Jun 08 '17 at 23:27
  • Thanks but I'd rather prefer doing this using pl/pgsql as I'm more familiar... – Patrick B. Jun 08 '17 at 23:45
3

You can run a single SQL command with data-modifying CTEs. Then get the row count with GET DIAGNOSTICS in a plpgsql function:

CREATE OR REPLACE FUNCTION migrate_data(OUT row_ct int)
  LANGUAGE plpgsql AS
$func$
BEGIN
   WITH sel AS (
      SELECT id
      FROM   tablea
      WHERE  migrated = false
      ORDER  BY id   -- to proceed in deterministic order
      LIMIT  2000
      FOR    UPDATE  -- only for concurrent write access
      )
    , upd AS (
      UPDATE tablea a 
      SET    migrated = TRUE
      FROM   cte
      WHERE  a.id = sel.id
      )
   INSERT INTO tableb(id)
   TABLE  sel;  -- shorthand for: SELECT * FROM sel

GET DIAGNOSTICS row_ct = ROW_COUNT; -- writes to OUT param directly END $func$;

Related:

You might just use the SQL statement without the function wrapper instead. Possibly as prepared statement.

Erwin Brandstetter
  • 175,982
  • 27
  • 439
  • 600