There is a column named in table but it cannot be referenced from this part of the query

Question

I've been learning Postgres (coming from SQL Server), and this error really confuses me.

Here is the code with some sample data:

create table T (
ID serial primary key,
A varchar(1),
B varchar(1),
C varchar(1)
)

↑ Testing table.

insert into T (A, B, C)
values('A', 'B', 'C'), ('A', 'B', 'C')

↑ Insert duplicates

delete from T
where ID in (
    select t.ID
    from (  select ID, row_number() over (partition by A,B,C order by A,B,C) as rn
            from T) as t
    where t.rn < (select max(t.rn) from t)
    )

↑ Delete duplicate keeping the last entry.

The problem is in the (select max(t.rn) from t) I am assuming this is a noob mistake related to not knowing postgres syntax when it comes to referencing columns with aliases?

Unrelated, but: you can simplify this by using dense_rank() over (... order by a desc, b desc, c desc) as rn and then use where t.rn > 1 removing the need for the max() subquery — , Sep 06 '19 at 12:45

Erwin Brandstetter · Accepted Answer · 2019-09-06T18:49:32.833

First off, partition by A,B,C order by A,B,C makes no sense. Since you intent to keep the "last" row (meaning the one with the greatest ID), you probably meant:

partition by A,B,C order by ID

Still, the syntax is invalid on top of this. This subquery expression contains a reference to a column of the outer query: (select max(t.rn) from t). The scope of the subquery does not include columns from the outer query so rn is not visible in there. Only columns of the table t are.

You could use a CTE to allow the reference and make the syntax valid:

WITH cte AS (SELECT id, row_number() OVER (PARTITION by a,b,c ORDER BY id) AS rn FROM t)
DELETE FROM t
WHERE  id IN (
    SELECT id
    FROM   cte
    WHERE  rn < (SELECT max(rn) FROM cte)
    )

Still, the query is dangerous nonsense. Do not use this!
Comparing to the greatest row number is logical nonsense as each group of peers might have a different number of dupes. Would delete a lot more than it should.

Simpler, and correct:

DELETE FROM t
WHERE  id IN (
    SELECT t1.id
    FROM  (SELECT id, row_number() OVER (PARTITION by a,b,c ORDER BY id DESC) AS rn FROM t) t1
    WHERE  t1.rn > 1  --  all but the latest
    );

Which, in turn, can be had more cheaply as (assuming all columns NOT NULL!):

DELETE FROM t
WHERE  EXISTS (
   SELECT FROM t AS t1
   WHERE t1.a = t.a
   AND   t1.b = t.b
   AND   t1.c = t.c
   AND   t1.id > t.id
   );

"... where a dupe exists with a greater ID".

How do I remove duplicate records in a join table in PostgreSQL?

I already changed it actually, noticed it as well but forgot to update the OP. But I am still interested in why I am getting that error. — Chessbrain, Sep 06 '19 at 17:21
Though I'm not sure of core issue, select max(column) + 1 worked for me — Suresh, Nov 01 '23 at 05:28

There is a column named in table but it cannot be referenced from this part of the query

1 Answers1