How do I decompose ctid into page and row numbers?

Question

Each row in a table has a system column ctid of type tid that represents the physical location of the row:

create table t(id serial);
insert into t default values;
insert into t default values;

select ctid
     , id
from t;

ctid  | id
:---- | -:
(0,1) |  1
(0,2) |  2

dbfiddle here

What's the best way of getting just the page number as from the ctid in the most appropriate type (eg integer, bigint or numeric(1000,0))?

The only way I can think of is very ugly.

IIRC it is a vector type and we don't have accessor methods on these. I am not sure if you can do it from a C function. Craig will tell for sure :) — András Váczi, May 27 '14 at 13:56
Can you cast as POINT? Eg. select ct[0], ct[1] from (select ctid::text::point as ct from pg_class where ...) y; — bma, May 27 '14 at 16:38
The title suggests you are after both the page number and the tuple index, later you narrow down to page number. I went with the version in the body, tuple index is a trivial extension. — Erwin Brandstetter, May 27 '14 at 19:22

Erwin Brandstetter · Accepted Answer · 2021-06-08T01:49:47.003

SELECT (ctid::text::point)[0]::bigint AS block_number FROM t;

db<>fiddle here

@bma suggested something similar in his comment. Here is a ...

Rationale for the type

ctid is of type tid (tuple identifier), called ItemPointer in the C-language source code. The manual:

This is the data type of the system column ctid. A tuple ID is a pair (block number, tuple index within block) that identifies the physical location of the row within its table.

Bold emphasis mine. And:

(ItemPointer, also known as CTID)

A block is 8 KB in standard installations. Maximum Table Size is 32 TB. It follows logically that block numbers must accommodate at least a maximum of (fixed per @Daniel's comment):

SELECT (2^45 / 2^13)::int      -- = 2^32 = 4294967294

Which would fit into an unsigned integer. On further investigation I found in the source code that ...

blocks are numbered sequentially, 0 to 0xFFFFFFFE.

Bold emphasis mine. Which confirms the first calculation:

SELECT 'xFFFFFFFE'::bit(32)::int8 -- max page number: 4294967294

Postgres uses signed integer and is therefore one bit short. I couldn't pin down, yet, whether the text representation is shifted to accommodate signed integer. Until somebody can clear this up, I would fall back to bigint, which works in any case.

Cast

There is no registered cast for the tid type in Postgres 9.3 (still true in Postgres 13):

SELECT *
FROM   pg_cast
WHERE  castsource = 'tid'::regtype
OR     casttarget = 'tid'::regtype;
castsource | casttarget | castfunc | castcontext | castmethod
------------+------------+----------+-------------+------------
(0 rows)

You can still cast to text. There is a text representation for every type in Postgres:

Another important exception is that "automatic I/O conversion casts", those performed using a data type's own I/O functions to convert to or from text or other string types, are not explicitly represented in pg_cast.

The text representation matches that of a point, which consists of two float8 numbers, that cast is lossless.

You can access the first number of a point with index 0. Cast to bigint. Voilá.

Performance

I ran a quick test in Postgres 9.4 on a table with 30k rows (best of 5) with a couple of expressions that came to mind, including your original:

SELECT (ctid::text::point)[0]::int                              --  25 ms
     , right(split_part(ctid::text, ',', 1), -1)::int           --  28 ms
     , ltrim(split_part(ctid::text, ',', 1), '(')::int          --  29 ms
     , (ctid::text::t_tid).page_number                          --  31 ms
     , (translate(ctid::text,'()', '{}')::int[])[1]             --  45 ms
     , (replace(replace(ctid::text,'(','{'),')','}')::int[])[1] --  51 ms
     , substring(right(ctid::text, -1), '^\d+')::int            --  52 ms
     , substring(ctid::text, '^\((\d+),')::int                  -- 143 ms
FROM   tbl;

int instead of bigint, mostly irrelevant for the purpose of the test. I eventually repeated the test in Postgres 13 with bigint on a table with 50k rows. Results are largely the same!

The cast to t_tid builds on a user-defined composite type, like @Jake commented.
The gist of it: Casting tends to be faster than string manipulation. Regular expressions are expensive. The above solution is shortest and fastest.

Thanks Erwin, useful stuff. From here it looks like ctid is 6 bytes with 4 for the page and 2 for the row. I was worried about casting to float but I guess I needn't have from what you say here. It looks like a user defined composite type is much slower then using point, do you find that too? — Jack Douglas, May 27 '14 at 20:55
@JackDouglas: Upon further investigation I have fallen back to bigint. Consider the update. — Erwin Brandstetter, May 27 '14 at 21:10
@JackDouglas: I like your idea of a cast to a composite type. It's clean and performs very well - even if the cast to point and back to int8 is still faster). Cast to predefined types will always be a bit faster. I added it to my test to compare. I'd make that (page_number bigint, row_number integer) to be sure. — Erwin Brandstetter, May 27 '14 at 21:25
2^40 is only 1TB, not 32TB which is 2^45, which divided by 2^13 gives 2^32, hence the full 32 bits being necessary for the page number. — Daniel Vérité, May 28 '14 at 10:54
@DanielVérité: Good catch, thanx. I amended accordingly. That removes the phantom mismatch. — Erwin Brandstetter, May 28 '14 at 14:53
Also perhaps worthy of note is that pg_freespacemap uses bigint for blkno — Jack Douglas, May 29 '14 at 20:21
up. This helped me delete a duplicate row: delete from contact where ((ctid::text::point)[1]::bigint = 1); — radarbob, Feb 08 '18 at 19:45
@radarbob: For the record: to identify duplicates, you can operate with the ctid directly. Like: https://dba.stackexchange.com/a/36614/3684 — Erwin Brandstetter, Jul 07 '21 at 13:08

How do I decompose ctid into page and row numbers?

1 Answers1

Rationale for the type

Cast

Performance

Linked