1

Given a table t:

id | name
------------
1  | abcfug
1  | deffug
1  | hijfug
2  | etc

How can I do something like:

select string_agg(strip_lcs(name), ', ') from t where id = 1

returning:

abc, def, hij

NB I wrote an aggregate function to return lcs if that helps:

CREATE FUNCTION lcs_iterate(_state TEXT, _value TEXT)
RETURNS TEXT
AS
$$
        SELECT  RIGHT($2, s - 1)
        FROM    generate_series(1, LEAST(LENGTH($1), LENGTH($2))) s
        WHERE   RIGHT($1, s) <> RIGHT($2, s)
        UNION ALL
        SELECT  LEAST($1, $2)
        LIMIT 1;
$$
LANGUAGE 'sql';

CREATE AGGREGATE lcs(TEXT) (SFUNC = lcs_iterate, STYPE = TEXT);
Erwin Brandstetter
  • 175,982
  • 27
  • 439
  • 600
Sam
  • 129
  • 3
  • 1
    Your Postgres version? Min / max / avg length of string? Min / max / avg length of suffix? Min / max / avg number of rows per group? There are many ways to solve this, the tricky part is to make it fast. See: https://dba.stackexchange.com/a/43444/3684 – Erwin Brandstetter Nov 23 '18 at 15:03
  • @erwin currently 9.1, and these are FQDNs typically around 25 chars. Small table and performed infrequently so performance not a huge deal, clear to understand code preferable. Will study your link, thanks. – Sam Nov 24 '18 at 21:53
  • Can you provide the actual examples? – Evan Carroll Nov 26 '18 at 06:02
  • funny that not even the lcs_iterate function works for me. – Gunther Schadow Feb 11 '22 at 23:32

2 Answers2

1

Given that you have an aggregate that finds the longest common suffix

WITH x AS( 
    SELECT left(name,length(name)-length(lcs(name) over ())) AS s 
    FROM t WHERE id = 1
)
SELECT string_agg(s,', ') FROM x;
Jasen
  • 3,563
  • 1
  • 13
  • 17
  • 1
    Thank you for your solution, it was very useful, and it was difficult to chose which answer to accept. – Sam Nov 26 '18 at 08:52
1

Your aggregate function is smart and fast, but there is a bug. If one string matches the tail of another completely, the UNION ALL part kicks in to return LEAST($1, $2). That must instead be something like CASE WHEN length($1) > length($2) THEN $2 ELSE $1 END. Test with 'match' and 'aa_match'. (See fiddle below.)

Plus, make the function IMMUTABLE STRICT:

CREATE OR REPLACE FUNCTION lcs_iterate(_state text, _value text)
  RETURNS text AS
$func$
SELECT  right($2, s - 1)
FROM    generate_series(1, least(length($1), length($2))) s
WHERE   right($1, s) <> right($2, s)

UNION   ALL
SELECT  CASE WHEN length($1) > length($2) THEN $2 ELSE $1 END  -- !
LIMIT  1;
$func$ LANGUAGE sql IMMUTABLE STRICT;  -- !

NULL values are ignored and empty strings lead to zero-length common suffix. You may want to treat these special cases differently ...

While we only need the length of the common suffix, a very simple FINALFUNC returns just that:

CREATE AGGREGATE lcs_len(text) (
   SFUNC = lcs_iterate
 , STYPE = text
 , FINALFUNC = length()  -- !
   );

Then your query can look like:

SELECT string_agg(trunc, ', ') AS truncated_names
FROM  (
   SELECT left(name, -lcs_len(name) OVER ()) AS trunc
   FROM   tbl
   WHERE  id = 1
   ) sub;

.. using the custom aggregate as window function.

db<>fiddle here

I also tested with Postgres 9.4, and it should work with your outdated Postgres 9.1, but that's too old for me to test. Consider upgrading to a current version.

Related:

Erwin Brandstetter
  • 175,982
  • 27
  • 439
  • 600
  • 1
    Thank you @Erwin. NB I did finalfunc = length, rather than defining lcs_final_len; I did -lcs_len(host) OVER (), rather than lcs_len(name) OVER (PARTITION BY id) * - 1. – Sam Nov 26 '18 at 08:59
  • Yes, a custom function for finalfunc was overkill for the simple case. And yes, PARTITION is only needed for multiple IDs in the result (like demonstrated in the fiddle). I added your improvements. – Erwin Brandstetter Nov 26 '18 at 13:28