1

I have a PostgreSQL table with the following schema and data:

CREATE TABLE IF NOT EXISTS T(
     id uuid PRIMARY KEY,
     username varchar(15),
     person varchar(10),
     tweets int,
     followers int,
     following int,
     likes int,
     created_at date) 
;
id username person tweets followers following likes created_at
3fa34100-d688-4051-a687-ec49d05e7212 renok null 110 6 0 0 2020-10-10
bab9ceb9-2770-49ea-8489-77e5d763a223 Lydia_C test user2 515 1301 1852 1677 2020-10-10
4649077a-9188-4821-a1ec-3b38608ea44a Kingston_Sav null 2730 1087 1082 1339 2020-10-10
eef80836-e140-4adc-9598-8b612ab1825b TP_s null 1835 998 956 1832 2020-10-10
fd3ff8c7-0994-40b6-abe0-915368ab9ae5 DKSnr4 null 580 268 705 703 2020-10-10
3fa34100-d688-4051-a687-ec49d05e7312 renok null 119 6 0 0 2020-10-12
bab9ceb9-2770-49ea-8489-77e5d763a224 Lydia_C test user2 516 1301 1852 1687 2020-10-12
4649077a-9188-4821-a1ec-3b38608ea44B Kingston_Sav null 2737 1090 1084 1342 2020-10-12
eef80836-e140-4adc-9598-8b612ae1835c TP_s null 1833 998 957 1837 2020-10-12
fd3ff8c7-0994-40b6-abe0-915368ab7ab5 DKSnr4 null 570 268 700 703 2020-10-12

I intend to get the biggest difference between the most recent date and the next most recent date for each unique username and the find the username with the largest margin (difference) for example..In the above table the most recent date is 2020-10-12 and the next most recent date is 2020-10-10.

So I want to get something like this

 id                                   | username     | person | tweets | followers | following | likes | created_at | prev_followers | gain
 :----------------------------------- | :----------- | :----- | -----: | --------: | --------: | ----: | :--------- | -------------: | ---:
 4649077a-9188-4821-a1ec-3b38608ea44a | Kingston_Sav | null   |   2737 |      1090 |      1084 |  1342 | 2020-10-12 |           1087 |    3
basebandit
  • 13
  • 4
  • Aside: your column id varchar(36) looks like it could (better) be id uuid PRIMARY KEY. See: https://dba.stackexchange.com/a/89433/3684, https://dba.stackexchange.com/a/115316/3684 – Erwin Brandstetter Oct 31 '20 at 03:59

2 Answers2

0
WITH 
cte1 AS ( SELECT DISTINCT created_at
          FROM t
          ORDER BY 1 DESC LIMIT 2 ),
cte2 AS ( SELECT src.*, 
                 LEAD(src.followers) OVER (PARTITION BY src.id 
                                           ORDER BY src.created_at DESC) prev_followers,
                 src.followers - LEAD(src.followers) OVER (PARTITION BY src.id 
                                                           ORDER BY src.created_at DESC) gain
          FROM t src
          JOIN cte1 ON src.created_at = cte1.created_at )
SELECT *
FROM cte2
WHERE gain IS NOT NULL
ORDER BY gain DESC LIMIT 1;

fiddle


whenever there are two records with the same difference margin..It always shows one instead of two.

WITH 
cte1 AS ( SELECT DISTINCT created_at
          FROM t
          ORDER BY 1 DESC LIMIT 2 ),
cte2 AS ( SELECT src.*, 
                 LEAD(src.followers) OVER (PARTITION BY src.id 
                                           ORDER BY src.created_at DESC) prev_followers,
                 src.followers - LEAD(src.followers) OVER (PARTITION BY src.id 
                                                           ORDER BY src.created_at DESC) gain
          FROM t src
          JOIN cte1 ON src.created_at = cte1.created_at ),
cte3 AS ( SELECT *, RANK() OVER (ORDER BY gain DESC) rnk
          FROM cte2
          WHERE gain IS NOT NULL )
SELECT *
FROM cte3
WHERE rnk = 1;

fiddle

Akina
  • 19,866
  • 2
  • 17
  • 21
  • Hello @Akina that is not what I want. When I run your solution this is what I get https://dbfiddle.uk/?rdbms=postgres_13&fiddle=ae0976eafa089337fb43ac9e4e551fb3 – basebandit Oct 29 '20 at 12:02
  • @basebandit Edited. Check. – Akina Oct 29 '20 at 12:09
  • thank you it now works as expected however whenever there are two records with the same difference margin..It always shows one instead of two. Had to change limit to 2 for such scenarios. – basebandit Oct 30 '20 at 10:18
  • @basebandit Had to change limit to 2 for such scenarios. ??? and what if there are 3 rows to be returned? Updated, look. – Akina Oct 30 '20 at 10:25
  • It now takes into account such scenarios. Both your answers are correct. – basebandit Oct 30 '20 at 10:30
0

Many ways lead to Rome. The below should be a good one (fast and flexible) to "find the username with the largest margin"

Assuming all involved columns are defined NOT NULL. And each username can only have one entry per day. Else you have to do more.

WITH cte AS (
   SELECT *, dense_rank() OVER (ORDER BY created_at DESC) AS rnk
   FROM   tbl
   )
SELECT d1.*
     , d2.followers AS prev_followers
     , d1.followers - d2.followers AS gain
FROM  (SELECT * FROM cte WHERE rnk = 1) d1
JOIN  (SELECT * FROM cte WHERE rnk = 2) d2 USING (username)
ORDER  BY gain DESC
        , d1.followers, username  -- added tiebreaker
LIMIT  1;

The CTE named cte attaches rank numbers with dense_rank() (not rank(), not row_number()). Then join the latest day (rnk = 1) with the one before (rnk = 2) and calculate the gain. Obviously, users must have entries for both days to qualify. Finally order by the gain and take the first row.

Note the added ORDER BY expressions to try and break possible ties: there can be multiple users with the same gain, so you have to define how to deal with that. One way is to add tiebreakes. In my example, a user with a smaller absolute number of followers is preferred (higher relative gain), and the alphabetically first wins if that's still ambiguous.

Or you return all "winners":

Again, many ways ... Postgres 13 added the standard SQL clause WITH TIES for that purpose exactly:

WITH cte AS (
   SELECT *, dense_rank() OVER (ORDER BY created_at DESC) AS rnk
   FROM   tbl
   )
SELECT d1.*
     , d2.followers AS prev_followers
     , d1.followers - d2.followers AS gain
FROM  (SELECT * FROM cte WHERE rnk = 1) d1
JOIN  (SELECT * FROM cte WHERE rnk = 2) d2 USING (username)
ORDER  BY gain DESC
FETCH  FIRST 1 ROWS WITH TIES;

db<>fiddle here

Detailed explanation for WITH TIES:

Erwin Brandstetter
  • 175,982
  • 27
  • 439
  • 600
  • This is a very good answer and it works but some columns for example person can be null and is indeed expected to be null for some specific usecase. I picked your answer over @akina's answer because for cases where there is a tie i.e there are two records with the same highest gain, your answer will show the both of them. – basebandit Oct 30 '20 at 10:18
  • Here's a fiddle comparing both of your given solutions. https://dbfiddle.uk/?rdbms=postgres_13&fiddle=daadb4deea204dfbee517cfd8d9538cb – basebandit Oct 30 '20 at 10:23
  • 1
    Here's an analysis of both answers – basebandit Oct 30 '20 at 11:51
  • person can be NULL, it's just payload. If one of the columns used to sort or join can be NULL, you need to understand NULL and may need to do more. See: https://stackoverflow.com/a/9511492/939860. Please remember to provide relevant table definitions showing data types and constraints (CREATE TABLE statement) in questions. A fiddle serves well, too. – Erwin Brandstetter Oct 30 '20 at 12:12