1

Let's consider a table like that:

id |            name
 1 | This is a test sentence
 2 | is a sentence TeST
 3 | This a IS test sentence
 4 | iS a tEst sentence tHis
 5 | This a test sentence is

How could I achieve to get all the rows, searching for "is a sentence test"?

I am using:

SELECT *
FROM table
WHERE name ILIKE '%is%' AND name ILIKE '%a%' AND name ILIKE '%sentence%' AND name ILIKE '%test%'

But that doesn't seem to be the best approach.

Evan Carroll
  • 63,051
  • 46
  • 242
  • 479
Roots
  • 15
  • 1
  • 4
  • 1
    Try posix regex: https://www.postgresql.org/docs/current/static/functions-matching.html#FUNCTIONS-POSIX-REGEXP – Scott Marlowe Sep 08 '17 at 19:28
  • 1
    "searching for" does not define the search conditions you desire. Exact string match? Complete, leading, trailing? Case sensitive? Fuzzy? An exact question is 50 % of the solution. And always disclose your version of Postgres and the exact table definition showing data types and constraints (CREATE TABLE statement). Overview of available tools: https://dba.stackexchange.com/questions/10694/pattern-matching-with-like-similar-to-or-regular-expressions-in-postgresql/10696 – Erwin Brandstetter Sep 08 '17 at 22:35
  • Wow, I just read @ErwinBrandstetter's comment and looked at the code at the code you had scrolling off the screen. Is this supposed to have all the words? And you're including "is" (a stop word)? – Evan Carroll Sep 08 '17 at 23:14

2 Answers2

1

You have LIKE and ILIKE and ~ and ~*. This covers the SQL-LIKE conditionals and the more beefy and more powerful regex-like features.

SELECT
 s,
 s LIKE '%is a sentence test%' as "like",
 s ILIKE '%is a sentence test%' AS "ilike",
 s ~ '.*is a sentence test.*' AS "~",
 s ~* '.*is a sentence test.*' AS "~*"
FROM ( VALUES
 ( 'This is a test sentence' ),
 ( 'is a sentence TeST' ),
 ( 'This a IS test sentence' ),
 ( 'iS a tEst sentence tHis' ),
 ( 'This a test sentence is' )
) AS t(s);

            s            | like | ilike | ~ | ~* 
-------------------------+------+-------+---+----
 This is a test sentence | f    | f     | f | f
 is a sentence TeST      | f    | t     | f | t
 This a IS test sentence | f    | f     | f | f
 iS a tEst sentence tHis | f    | f     | f | f
 This a test sentence is | f    | f     | f | f
(5 rows)

You're not using any advanced regex powers, so I'd go with LIKE, or ILIKE.

Evan Carroll
  • 63,051
  • 46
  • 242
  • 479
1

First you use a split_to_array function to put your strings in to arrays (https://www.postgresql.org/docs/9.6/static/functions-string.html) and then you can compare them. Postgres has a cool function to check if one array contains another. More info on Postgres arrays here:https://www.postgresql.org/docs/9.5/static/functions-array.html

Table setup:

CREATE TABLE stringchecks
(
    id serial,
    name varchar(100)
);

INSERT INTO stringchecks (name) VALUES ('This is a test sentence');
INSERT INTO stringchecks (name) VALUES  ('is a sentence TeST');
INSERT INTO stringchecks (name) VALUES ('This a IS test sentence');
INSERT INTO stringchecks (name) VALUES ('iS a tEst sentence tHis');
INSERT INTO stringchecks (name) VALUES ('This a test sentence is');

Checking for matches you have two options. You can use regexp_split_to_array:

SELECT
    id,
    name,
    regexp_split_to_array(lower(name), E'\\s+') @> 
       regexp_split_to_array(lower('is a sentence test'), E'\\s+') 
FROM stringchecks;

Or you can use string_to_array:

SELECT
    id,
    name,
    string_to_array(lower(name), ' ') @> 
       string_to_array(lower('is a sentence test'), ' ')
FROM stringchecks;

In this case because it's a simple delimiter it will be faster to use string_to_array. http://www.postgresonline.com/journal/archives/370-regexp_split_to_table-and-string_to_array-unnest-performance.html

indiri
  • 2,211
  • 8
  • 12
  • indiri, that is taking very long with 70000 itens table. any tips to improve it? I run that query for each row of that table. – Roots Oct 11 '17 at 13:10
  • @Roots Since it's a simple delimiter you can use string_to_array instead of regexp_split_to_array to speed it up. I've updated my answer. – indiri Oct 11 '17 at 15:51