0

I have a table with a column "name", the table is describing a media asset and the name is a character varying field containing the file name. The table is generated and used by a CMS (Strapi) and I can't really tweak how the columns are used nor the SQL being executed. What I'm hoping to do is slap on an index (or two) and get a bit better performance.

The file names of our files are very similar, pretty much XYZ12345-Q2.png, where XYZ is the same for about 80% of the files. So what I'm wondering is what kind of index (if any) would help speeding up a query such as:

select count(*) as "count" 
from "upload_file" 
where ("upload_file"."name"::text ILIKE '%some_string%' 
       or "upload_file"."id"::text ILIKE '%some_string%'

The id is the primary key, and it's an auto incrementing positive integer.

My concern regarding the actual string is that an index won't do much when the file names are so similar? Or would it actually make a difference? In that case what would be the best index type to use? My understanding of Gin is that it wouldn't really suit this case because there are no words (none of the file names contain space).

Jon
  • 101
  • 3
  • The id is the primary key, and it's an auto incrementing positive integer Then what's the intention behind or "upload_file"."id"::text ILIKE '%some_string%'? Doesn't seem to make sense. – Erwin Brandstetter Feb 24 '21 at 19:32
  • XYZ12345-Q2.png, where XYZ is the same for about 80% of the files. is "XYZ" actually much longer than the (significant) rest of the string? What about the other 20%? Does some_string only match at the end or also on "XYZ"? – Erwin Brandstetter Feb 24 '21 at 19:35
  • @ErwinBrandstetter The SQL generated by this CMS are generally horrible and I can't do much about that, I've managed to change one query where it was defaulting to ILIKE matching every string and numeric field in tables but that was a merge request and Strapi is written in React which isn't my strong suit. I agree that it makes no sense – Jon Feb 25 '21 at 09:01
  • The problematic CMS shouldn't keep you from answering the other questions. – Erwin Brandstetter Feb 25 '21 at 12:49
  • Regarding the filenames, one example is IDE1133_Q1.JPG. Here the search would most likely be 1133. The worst query, which I don't have any hopes in optimizing with indices is: select distinct "upload_file".*, "upload_file"."updated_at" as "_strapi_tmp_upload_file_updated_at" from "upload_file" order by "_strapi_tmp_upload_file_updated_at" desc limit $1. I'm not that good with pg but it looks like a lost cause – Jon Feb 25 '21 at 13:32
  • And while the suggested answer that is linked contains a lot of useful information, but it increases my scepticism towards using a Gin index on these fields as it's single "word" and not really natural language but more an ID generated by a human. But I'll give it a shot and see if it increases performance – Jon Feb 25 '21 at 13:38

1 Answers1

2

What a terrible query! Consider using different software.

The only thing that could help are trigram indexes on both columns:

CREATE EXTENSION pg_trgm;

CREATE INDEX ON upload_file USING gin ((name::text) gin_trgm_ops); CREATE INDEX ON upload_file USING gin ((id::text) gin_trgm_ops);

Then you might get two bitmap index scans and a "bitmap or".

If not, the condition is probably not selective enough, and a sequential scan is the most efficient way to handle it.

Laurenz Albe
  • 51,298
  • 4
  • 39
  • 69