Recursive CTE to find unique slug

Question

I have an article table where I want the slug to be unique.

CREATE TABLE article (
   title char(50) NOT NULL,
   slug  char(50) NOT NULL
);

When the user enters a title e.g. News on Apple, I want to check the database to see if a corresponding slug exists e.g. news-on-apple. If it does, I'll suffix a numerical value until I find a unique one e.g. news-on-apple-1. Can that be achieved with a recursive CTE query instead of doing recursion in my ORM. Is there a good ballpark number where I should stop recursing and error out. I can imagine people using the same title a 1000 times which would result in 1000 queries just to create 1 article.

It's possible that my understanding of recursive CTE is incorrect and there's no better way to find a unique slug. Please suggest any alternatives.

Erwin Brandstetter · Accepted Answer · 2023-08-01T17:48:13.647

First off, you do not want to use the data type ~~char(50)~~. Use varchar(50) or just text. See:

Any downsides of using data type “text” for storing strings?

Assuming the following rules:

Basic slugs never end with a dash.
Duplicate slugs are suffixed with a dash and a sequential number (-123).

Note that all of the following methods are subject to a race conditions: concurrent operations might identify the same "free" name for the next slug.
To defend against it, you can impose a UNIQUE constraint on slug and be prepared to repeat an INSERT upon duplicate key violation or you to take out a write lock on the table at the start of the transaction.

If you glue the suffix to the basic slug name with a dash and allow basic slugs to end in separate numbers, the specification is a tiny bit ambiguous (see comments). I suggest a unique delimiter of your choice instead (which is otherwise disallowed).

Efficient rCTE

WITH RECURSIVE
  input AS (SELECT 'news-on-apple'::text AS slug)  -- input basic slug here once
, cte   AS (
   SELECT slug || '-' AS slug  -- append '-' once, if basic slug exists
        , 1 as suffix          -- start with suffix 1
   FROM   article
   JOIN   input USING (slug)
UNION ALL
   SELECT c.slug, c.suffix + 1  -- increment by 1 ...
   FROM   cte     c
   JOIN   article a ON a.slug = c.slug || c.suffix  -- ... if slug-n already exists
   )
(
SELECT slug || suffix AS slug
FROM   cte
ORDER  BY suffix DESC  -- pick the last (free) one
LIMIT  1
)  -- parentheses required
UNION  ALL  -- if the basic slug wasn't taken, fall back to that
SELECT slug FROM input
LIMIT  1;

Better performance without rCTE

If you worry about thousands of slugs competing for the same slug or generally want to optimize performance, I'd consider a different, faster approach.

WITH input AS (SELECT 'news-on-apple'::text  AS slug
                    , 'news-on-apple-'::text AS slug1)  -- input basic slug here
SELECT i.slug
FROM   input        i
LEFT   JOIN article a USING (slug)
WHERE  a.slug IS NULL  -- doesn't exist yet.
UNION ALL
(  -- parentheses required
SELECT i.slug1 || COALESCE(right(a.slug, length(i.slug1) * -1)::int + 1, 1)
FROM   input        i
LEFT   JOIN article a ON a.slug LIKE (i.slug1 || '%')  -- match up to last "-"
                     AND right(a.slug, length(i.slug1) * -1) ~ '^\d+$' -- suffix numbers only
ORDER  BY right(a.slug, length(i.slug1) * -1)::int DESC
)
LIMIT  1;

If the basic slug isn't taken yet, the more expensive second SELECT is never executed - same as above, but much more important here. Check with EXPLAIN ANALYZE, Postgres is smart that way with LIMIT queries. See:

Optimize a query on two big tables

Check for the leading string and the suffix separately, so the LIKE expression can use a basic btree index with text_pattern_ops like

CREATE INDEX article_slug_idx ON article (slug text_pattern_ops);

Detailed explanation:

Pattern matching with LIKE, SIMILAR TO or regular expressions

Convert the suffix to integer before you apply max(). Numbers in text representation don't work.

Optimize performance

To get the optimum, consider storing the suffix separated from the basic slug and concatenate the slug as needed: concat_ws('-' , slug, suffix::text) AS slug

CREATE TABLE article (
   article_id serial PRIMARY KEY
 , title text NOT NULL
 , slug  text NOT NULL
 , suffix int
);

The query for a new slug then becomes:

SELECT slug
    || COALESCE((
          SELECT '-'::text || (max(suffix) + 1)::text
          FROM   article a
          WHERE  a.slug = i.slug), '') As slug
FROM  (SELECT 'news-on-apple'::text AS slug) i  -- input basic slug here

Ideally supported with a unique index on (slug, suffix).

Query for list of slugs

In any version of Postgres you can provide rows in a VALUES expression.

SELECT *
FROM   article
JOIN  (
   VALUES
     ('slug-foo'::text, 1)
     ('slug-bar',7)
   ) u(slug,suffix) USING (slug,suffix);

You can also use IN with a set of row-type expressions Which is shorter:

SELECT *
FROM   article
WHERE (slug,suffix) IN (('slug-foo', 1), ('slug-bar',7));

Details under this related question (as commented below):

the <set clause>'s <multiple column assignment>

For long lists, the JOIN to a VALUES expression is typically faster.

Since Postgres 9.4 you can also use the new variant of unnest() to unnest multiple arrays in parallel.

Given an array of basic slugs and a corresponding array of suffixes (as per comment):

SELECT *
FROM   article
JOIN   unnest('{slug-foo,slug-bar}'::text[]
            , '{1,7}'::int[]) AS u(slug,suffix) USING (slug,suffix);

Thank you is an understatement for such an informative & well written answer ! I've some followup questions. For the 2nd method, if a user sets the following 3 titles in order, bmw, bmw 745, bmw, the max function will result in third slug being bmw-746, although we would ideally want bmw-1 here. Using count is an option but that would also given unexpected slugs in certain cases. Specifically when there's a conflict between titles that end in numbers. — user4150760, Dec 17 '14 at 17:41
@user4150760: The ambiguity between the basic slug name and slug name plus appended suffix is built into your specifications. It's not something my solution added. You have to disambiguate yourself. For instance: use a character otherwise disallowed before the suffix. — Erwin Brandstetter, Dec 17 '14 at 17:56
In the 3rd method, what's an efficient way to query a list of slug & suffix pairs? Doing this: WHERE slug IN (<slug_list>) AND suffix IN (<suffix_list>) will result in cartesian product of all slugs & suffixes that are in the list. However I want the query to be restricted to corresponding pairs of slugs & suffixes. — user4150760, Dec 18 '14 at 14:42
@user4150760: Good question. There are elegant solutions. Should be another question. But I added another answer anyway. — Erwin Brandstetter, Dec 18 '14 at 20:33
Thank you for the update. I added a new question but deleted it later after @ypercube commented that a similar question has been answered : http://dba.stackexchange.com/questions/86373/the-set-clauses-multiple-column-assignment — user4150760, Dec 19 '14 at 04:05
@user4150760: I added the link you provided in the answer above. — Erwin Brandstetter, Dec 19 '14 at 04:55
I'm trying the "Better performance without rCTE" example and it's saying "column "i.slug1" must appear in the GROUP BY clause or be used in an aggregate function".
Adding a GROUP BY slug1 makes this go away. — btubbs, Aug 31 '16 at 21:52
Hmmm another issue with "Better performance without rCTE". The query returns 0 rows if the table contains 'news-on-apple' but not 'news-on-apple-1'. — btubbs, Sep 01 '16 at 15:32
@btubbs: Sorry, I updated with a query that works for this case as well. Replaced GROUP BY with ORDER BY - should be a bit faster. — Erwin Brandstetter, Sep 02 '16 at 02:32

Recursive CTE to find unique slug

1 Answers1

Efficient rCTE

Better performance without rCTE

Optimize performance

Query for list of slugs

Linked