24

Each row in a table has a system column ctid of type tid that represents the physical location of the row:

create table t(id serial);
insert into t default values;
insert into t default values;
select ctid
     , id
from t;
ctid  | id
:---- | -:
(0,1) |  1
(0,2) |  2

dbfiddle here

What's the best way of getting just the page number as from the ctid in the most appropriate type (eg integer, bigint or numeric(1000,0))?

The only way I can think of is very ugly.

Jack Douglas
  • 39,869
  • 15
  • 101
  • 176
  • 1
    IIRC it is a vector type and we don't have accessor methods on these. I am not sure if you can do it from a C function. Craig will tell for sure :) – András Váczi May 27 '14 at 13:56
  • 2
    Can you cast as POINT? Eg. select ct[0], ct[1] from (select ctid::text::point as ct from pg_class where ...) y; – bma May 27 '14 at 16:38
  • 1
    The title suggests you are after both the page number and the tuple index, later you narrow down to page number. I went with the version in the body, tuple index is a trivial extension. – Erwin Brandstetter May 27 '14 at 19:22

1 Answers1

33
SELECT (ctid::text::point)[0]::bigint AS block_number FROM t;

db<>fiddle here

@bma suggested something similar in his comment. Here is a ...

Rationale for the type

ctid is of type tid (tuple identifier), called ItemPointer in the C-language source code. The manual:

This is the data type of the system column ctid. A tuple ID is a pair (block number, tuple index within block) that identifies the physical location of the row within its table.

Bold emphasis mine. And:

(ItemPointer, also known as CTID)

A block is 8 KB in standard installations. Maximum Table Size is 32 TB. It follows logically that block numbers must accommodate at least a maximum of (fixed per @Daniel's comment):

SELECT (2^45 / 2^13)::int      -- = 2^32 = 4294967294

Which would fit into an unsigned integer. On further investigation I found in the source code that ...

blocks are numbered sequentially, 0 to 0xFFFFFFFE.

Bold emphasis mine. Which confirms the first calculation:

SELECT 'xFFFFFFFE'::bit(32)::int8 -- max page number: 4294967294

Postgres uses signed integer and is therefore one bit short. I couldn't pin down, yet, whether the text representation is shifted to accommodate signed integer. Until somebody can clear this up, I would fall back to bigint, which works in any case.

Cast

There is no registered cast for the tid type in Postgres 9.3 (still true in Postgres 13):

SELECT *
FROM   pg_cast
WHERE  castsource = 'tid'::regtype
OR     casttarget = 'tid'::regtype;

castsource | casttarget | castfunc | castcontext | castmethod ------------+------------+----------+-------------+------------ (0 rows)

You can still cast to text. There is a text representation for every type in Postgres:

Another important exception is that "automatic I/O conversion casts", those performed using a data type's own I/O functions to convert to or from text or other string types, are not explicitly represented in pg_cast.

The text representation matches that of a point, which consists of two float8 numbers, that cast is lossless.

You can access the first number of a point with index 0. Cast to bigint. Voilá.

Performance

I ran a quick test in Postgres 9.4 on a table with 30k rows (best of 5) with a couple of expressions that came to mind, including your original:

SELECT (ctid::text::point)[0]::int                              --  25 ms
     , right(split_part(ctid::text, ',', 1), -1)::int           --  28 ms
     , ltrim(split_part(ctid::text, ',', 1), '(')::int          --  29 ms
     , (ctid::text::t_tid).page_number                          --  31 ms
     , (translate(ctid::text,'()', '{}')::int[])[1]             --  45 ms
     , (replace(replace(ctid::text,'(','{'),')','}')::int[])[1] --  51 ms
     , substring(right(ctid::text, -1), '^\d+')::int            --  52 ms
     , substring(ctid::text, '^\((\d+),')::int                  -- 143 ms
FROM   tbl;

int instead of bigint, mostly irrelevant for the purpose of the test. I eventually repeated the test in Postgres 13 with bigint on a table with 50k rows. Results are largely the same!

The cast to t_tid builds on a user-defined composite type, like @Jake commented.
The gist of it: Casting tends to be faster than string manipulation. Regular expressions are expensive. The above solution is shortest and fastest.

Erwin Brandstetter
  • 175,982
  • 27
  • 439
  • 600
  • 1
    Thanks Erwin, useful stuff. From here it looks like ctid is 6 bytes with 4 for the page and 2 for the row. I was worried about casting to float but I guess I needn't have from what you say here. It looks like a user defined composite type is much slower then using point, do you find that too? – Jack Douglas May 27 '14 at 20:55
  • @JackDouglas: Upon further investigation I have fallen back to bigint. Consider the update. – Erwin Brandstetter May 27 '14 at 21:10
  • 1
    @JackDouglas: I like your idea of a cast to a composite type. It's clean and performs very well - even if the cast to point and back to int8 is still faster). Cast to predefined types will always be a bit faster. I added it to my test to compare. I'd make that (page_number bigint, row_number integer) to be sure. – Erwin Brandstetter May 27 '14 at 21:25
  • 1
    2^40 is only 1TB, not 32TB which is 2^45, which divided by 2^13 gives 2^32, hence the full 32 bits being necessary for the page number. – Daniel Vérité May 28 '14 at 10:54
  • @DanielVérité: Good catch, thanx. I amended accordingly. That removes the phantom mismatch. – Erwin Brandstetter May 28 '14 at 14:53
  • 1
    Also perhaps worthy of note is that pg_freespacemap uses bigint for blkno – Jack Douglas May 29 '14 at 20:21
  • up. This helped me delete a duplicate row: delete from contact where ((ctid::text::point)[1]::bigint = 1); – radarbob Feb 08 '18 at 19:45
  • @radarbob: For the record: to identify duplicates, you can operate with the ctid directly. Like: https://dba.stackexchange.com/a/36614/3684 – Erwin Brandstetter Jul 07 '21 at 13:08