Strip longest common suffix and concat in Postgres

Question

Given a table t:

id | name
------------
1  | abcfug
1  | deffug
1  | hijfug
2  | etc

How can I do something like:

select string_agg(strip_lcs(name), ', ') from t where id = 1

returning:

abc, def, hij

NB I wrote an aggregate function to return lcs if that helps:

CREATE FUNCTION lcs_iterate(_state TEXT, _value TEXT)
RETURNS TEXT
AS
$$
        SELECT  RIGHT($2, s - 1)
        FROM    generate_series(1, LEAST(LENGTH($1), LENGTH($2))) s
        WHERE   RIGHT($1, s) <> RIGHT($2, s)
        UNION ALL
        SELECT  LEAST($1, $2)
        LIMIT 1;
$$
LANGUAGE 'sql';

CREATE AGGREGATE lcs(TEXT) (SFUNC = lcs_iterate, STYPE = TEXT);

Your Postgres version? Min / max / avg length of string? Min / max / avg length of suffix? Min / max / avg number of rows per group? There are many ways to solve this, the tricky part is to make it fast. See: https://dba.stackexchange.com/a/43444/3684 — Erwin Brandstetter, Nov 23 '18 at 15:03
@erwin currently 9.1, and these are FQDNs typically around 25 chars. Small table and performed infrequently so performance not a huge deal, clear to understand code preferable. Will study your link, thanks. — Sam, Nov 24 '18 at 21:53

score 1 · Answer 1 · answered Nov 25 '18 at 01:59

1

Given that you have an aggregate that finds the longest common suffix

WITH x AS( 
    SELECT left(name,length(name)-length(lcs(name) over ())) AS s 
    FROM t WHERE id = 1
)
SELECT string_agg(s,', ') FROM x;

answered Nov 25 '18 at 01:59

Jasen

3,563
1
13
17

1

Thank you for your solution, it was very useful, and it was difficult to chose which answer to accept. – Sam Nov 26 '18 at 08:52

Erwin Brandstetter · Accepted Answer · 2018-11-26T13:39:55.827

Your aggregate function is smart and fast, but there is a bug. If one string matches the tail of another completely, the UNION ALL part kicks in to return LEAST($1, $2). That must instead be something like CASE WHEN length($1) > length($2) THEN $2 ELSE $1 END. Test with 'match' and 'aa_match'. (See fiddle below.)

Plus, make the function IMMUTABLE STRICT:

CREATE OR REPLACE FUNCTION lcs_iterate(_state text, _value text)
  RETURNS text AS
$func$
SELECT  right($2, s - 1)
FROM    generate_series(1, least(length($1), length($2))) s
WHERE   right($1, s) <> right($2, s)

UNION   ALL
SELECT  CASE WHEN length($1) > length($2) THEN $2 ELSE $1 END  -- !
LIMIT  1;
$func$ LANGUAGE sql IMMUTABLE STRICT;  -- !

NULL values are ignored and empty strings lead to zero-length common suffix. You may want to treat these special cases differently ...

While we only need the length of the common suffix, a very simple FINALFUNC returns just that:

CREATE AGGREGATE lcs_len(text) (
   SFUNC = lcs_iterate
 , STYPE = text
 , FINALFUNC = length()  -- !
   );

Then your query can look like:

SELECT string_agg(trunc, ', ') AS truncated_names
FROM  (
   SELECT left(name, -lcs_len(name) OVER ()) AS trunc
   FROM   tbl
   WHERE  id = 1
   ) sub;

.. using the custom aggregate as window function.

db<>fiddle here

I also tested with Postgres 9.4, and it should work with your outdated Postgres 9.1, but that's too old for me to test. Consider upgrading to a current version.

Is it possible to wrap aggregate functions in Postgres?

Thank you @Erwin. NB I did finalfunc = length, rather than defining lcs_final_len; I did -lcs_len(host) OVER (), rather than lcs_len(name) OVER (PARTITION BY id) * - 1. — Sam, Nov 26 '18 at 08:59
Yes, a custom function for finalfunc was overkill for the simple case. And yes, PARTITION is only needed for multiple IDs in the result (like demonstrated in the fiddle). I added your improvements. — Erwin Brandstetter, Nov 26 '18 at 13:28

Strip longest common suffix and concat in Postgres

2 Answers2