13

I have one Postgres table with more than 20M tuples:

first_name | last_name | email
-------------------------------------------
bat        | man       | batman@wayne.com
arya       | vidal     | foo@email.com
max        | joe       | bar@email.com

To filter the records I'm using:

SELECT *
  FROM people
WHERE (first_name || '' || last_name) ILIKE '%bat%man%' OR 
    first_name ILIKE '%bat%man%'  OR  
    last_name ILIKE '%bat%man%'   OR
    email ILIKE '%bat%man%'
    LIMIT 25 OFFSET 0

Even with indexes the search is taking almost one minute to return results.
There are indexes for (first_name || '' || last_name), first_name, last_name and email.

What can I do to improve the performance of this query?

Victor
  • 233
  • 1
  • 2
  • 7

1 Answers1

18

For your kind of pattern matching you best use a trigram index. Read this first:

I assume there's a typo in your expression (first_name || '' || last_name), which makes no sense with an empty string, and you really want (first_name || ' ' || last_name) - with a space character.

Assuming that either column can be NULL, you would need NULL-safe concatenation, the simple solution is concat_ws():

But that function is not IMMUTABLE (explanation in the linked answer), so you cannot use it directly in an index expression. You could use an IMMUTABLE function wrapper:

CREATE OR REPLACE FUNCTION f_immutable_concat_ws(s text, t1 text, t2 text)
  RETURNS text
  LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
'SELECT concat_ws(s, t1, t2)';

The wrapper can be IMMUTABLE because it only takes text parameters. (But don't be tempted to coerce data types in the call for which text representation is not immutable!)

Either way, this is more verbose but has less internal overhead and is considerably faster:

CREATE OR REPLACE FUNCTION f_immutable_concat_ws(s text, t1 text, t2 text)
  RETURNS text
  LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
$func$
SELECT CASE
         WHEN t1 IS NULL THEN t2
         WHEN t2 IS NULL THEN t1
         ELSE t1 || s || t2
       END
$func$;

Or, with hard-coded space character:

CREATE OR REPLACE FUNCTION f_concat_space(t1 text, t2 text)
  RETURNS text AS
  LANGUAGE sql IMMUTABLE PARALLEL SAFE
$func$
SELECT CASE
         WHEN t1 IS NULL THEN t2
         WHEN t2 IS NULL THEN t1
         ELSE t1 || ' ' || t2
       END
$func$;

Make it PARALLEL SAFE (in Postgres 9.6 or later) to not stand in the way of parallelism. (And because it qualifies!)

Basing the index on the last function, I suggest:

CREATE INDEX people_gin_trgm_idx ON people
USING gin (f_concat_space(first_name, last_name) gin_trgm_ops, email gin_trgm_ops);

I added email as second index column because you seem to be checking both at the same time.

Creating the index will take a while for 20M rows, best not during top load, or maybe use CREATE INDEX CONCURRENTLY .... A GIN index can be considerably bigger than a plain B-tree index and is more expensive to maintain. Be sure to run the latest version of Postgres, there have been major improvements for GIN indexes in recent versions.

Then your slightly adapted and simplified query should be fast and correct:

SELECT *
FROM   people
WHERE  f_concat_space(first_name, last_name) ILIKE '%bat%man%'
    OR email ILIKE '%bat%man%'
LIMIT  25;

You only need the one index for this query.

Related:

Erwin Brandstetter
  • 175,982
  • 27
  • 439
  • 600
  • 1
    Heh. Fast(ish) and correct. Infix searches are always going to be a bit slow. As for concat_ws, do you know why it's not immutable? I'd suggest checking that before just wrapping it, otherwise you might land up with index contents that don't match the current expression results for the same input, which behaves like index corruption. – Craig Ringer Oct 09 '15 at 04:56
  • 2
    @CraigRinger: I researched concat_ws() not being IMMUTABLE and added an explanation to the answer referenced above. The explanation is rather simple and my wrapper is safe because it only takes text parameters. – Erwin Brandstetter Oct 09 '15 at 05:00
  • Thanks, much appreciated. I only raised it because I really don't want to encourage people to just bypass !immutable flags with wrappers. – Craig Ringer Oct 09 '15 at 05:01
  • @CraigRinger: You are totally right about that. I have seen crimes happening in that dark alley before: http://stackoverflow.com/a/11007216/939860. I added a note above for why the wrapper is safe. – Erwin Brandstetter Oct 09 '15 at 05:06
  • I was wondering, how much of a speedup would the original poster have with this method? – xtrinch Jun 10 '21 at 11:35
  • 1
    @xtrinch: For big tables, a fitting trigram index can mean the difference between "instantaneous" and "takes forever". – Erwin Brandstetter Jun 10 '21 at 15:17