I am facing a situation where users SHOULD submit items that are written in plain English. However, it happens that names have accents - coming from various languages. I have a tsvector column in my table which is indexed for fulltext search.
Here is the info about this column, and relevant info about the table:
Table "public.companies"
Column | Type | Modifiers | Storage | Stats target | Description
----------------+------------------------+--------------------------------------------------------+----------+--------------+-------------
id | integer | not null default nextval('companies_id_seq'::regclass) | plain | |
name | character varying(128) | not null | extended | |
description | text | | extended | |
tsmeta | tsvector | | extended | |
Indexes:
"companies_pkey" PRIMARY KEY, btree (id) CLUSTER
"companies_search_idx" gin (tsmeta)
Triggers:
companies_tsmeta_update BEFORE INSERT OR UPDATE ON companies FOR EACH ROW EXECUTE PROCEDURE companies_tsmeta_trigger()
Now the problem comes when the user creates a name like français. Not all the users have the ç character at hand and those who don't will have troubles finding this entry - meaning that it will be stored with the cedilla-c in the tsvector column. So my thought was to use the unaccent extension in order to strip these accents. However now users that just type français will have problems and will not find the entry. So I came up with a function that just concatenates both versions (accented and accent-stripped) of the name and descriptions. However, descriptions can get pretty large and I worry the index will take too much storage.
create or replace function companies_tsmeta_trigger() returns trigger as $$
begin
new.tsmeta :=
setweight(to_tsvector('english', coalesce(new.name, '')), 'A') ||
setweight(to_tsvector('english', unaccent(coalesce(new.name, ''))), 'B') ||
setweight(to_tsvector('english', coalesce(new.description, '')), 'C') ||
setweight(to_tsvector('english', unaccent(coalesce(new.description, ''))), 'D');
return new;
end;
$$ language plpgsql;
What are your suggestions?