Index on column with data type citext not used

Question

In PostgreSQL 9.4, with following schema:

CREATE TABLE people (
    id INTEGER PRIMARY KEY,
    name TEXT,
    junk CHAR(1000)
);

INSERT INTO people(id, name)
 SELECT generate_series(1,100000), md5(random()::text);

CREATE INDEX ON people (name text_pattern_ops);

if I search by name, the index is used:

test=# explain analyze select id, name from people where name like 'a%';
                                                           QUERY PLAN                                                           
--------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on people  (cost=248.59..1160.92 rows=6061 width=37) (actual time=2.412..8.340 rows=6271 loops=1)
   Filter: (name ~~ 'a%'::text)
   Heap Blocks: exact=834
   ->  Bitmap Index Scan on people_name_idx  (cost=0.00..247.08 rows=6266 width=0) (actual time=2.123..2.123 rows=6271 loops=1)
         Index Cond: ((name ~>=~ 'a'::text) AND (name ~<~ 'b'::text))
 Planning time: 0.600 ms
 Execution time: 8.991 ms

But if I replace TEXT with CITEXT:

CREATE EXTENSION CIText;

CREATE TABLE people (
    id INTEGER PRIMARY KEY,
    name CITEXT,
    junk CHAR(1000)
);

The index is not used anymore:

test=# explain analyze select id, name from people where name like 'a%';
                                                QUERY PLAN                                                 
-----------------------------------------------------------------------------------------------------------
 Seq Scan on people  (cost=0.00..2084.00 rows=500 width=36) (actual time=5.700..152.572 rows=6305 loops=1)
   Filter: (name ~~ 'a%'::citext)
   Rows Removed by Filter: 93695
 Planning time: 0.764 ms
 Execution time: 153.046 ms

According to CITEXT PostgreSQL documentation, the behavior should be as with TEXT:

Otherwise, it behaves almost exactly like text.

How can I tell PostgreSQL to use the index?

Erwin Brandstetter · Accepted Answer · 2023-08-01T18:04:35.300

7

Index usage with text_pattern_ops (as well as with the default operator class when using the C locale) depends on the binary representation of character data. citext stores original values with the case preserved, so there must be a problem with that ...

Like you commented, the actual reason is burried in collation support.

Either way, citext or text, you can make it work with an expression index:

CREATE INDEX people_name_idx ON people (lower(name) text_pattern_ops);

And a corresponding query:

SELECT id, name FROM people WHERE lower(name) LIKE 'abc%';

Note that lower(name) returns data type text, even when feeding it citext.

Alternatively, you could use a trigram index, which is more expensive to maintain, but also offers more capabilities:

Aside: your test case is suboptimal because your dummy values are all lower case to begin with and the pattern 'a%' is often not selective enough to use an index at all. And char(1000) doesn't make sense (even though irrelevant to the test).

edited Aug 01 '23 at 18:04

answered Jun 26 '15 at 15:09

Erwin Brandstetter

175,982
27
439
600

1

It seems that the LIKE index optimization doesn't work with CITEXT: http://www.postgresql.org/message-id/27270.1364700924@sss.pgh.pa.us
There is an entry on PostgreSQL's TODO: https://wiki.postgresql.org/wiki/Todo (just search for the previous link)

In my case, search for case insensitive left anchored patterns, the use of CITEXT doesn't seem to provide any benefit. I'm reverting the column type to TEXT and using a functional index. About the triagam index, since some of my search patterns have only one or two characters ('a%' or 'ab%'), probably the use a B-tree index performs better.
– José Luis Jul 02 '15 at 19:39
1

@JoséLuis: For only left anchored searches a btree text_pattern_ops index performs best in any case. A trigram index would be relevant to do more than that (full-text searches, similarity, ...). It does work for 'a%' or 'ab%', too. The simple trick in the implementation is to prepend two space characters before chipping off trigrams (but that's transparent for the user). – Erwin Brandstetter Jul 02 '15 at 19:49
@JoséLuis I went through similar case (same to the one which you posted). CITEXT not taking indexes even in 10.4. – Dinesh Kumar Sep 09 '18 at 10:50

Index on column with data type citext not used

1 Answers1

Linked