Find the greatest difference between each unique record with different timestamps

Question

I have a PostgreSQL table with the following schema and data:

CREATE TABLE IF NOT EXISTS T(
     id uuid PRIMARY KEY,
     username varchar(15),
     person varchar(10),
     tweets int,
     followers int,
     following int,
     likes int,
     created_at date) 
;



  id
  username
  person
  tweets
  followers
  following
  likes
  created_at




  3fa34100-d688-4051-a687-ec49d05e7212
  renok
  null
  110
  6
  0
  0
  2020-10-10


  bab9ceb9-2770-49ea-8489-77e5d763a223
  Lydia_C
  test user2
  515
  1301
  1852
  1677
  2020-10-10


  4649077a-9188-4821-a1ec-3b38608ea44a
  Kingston_Sav
  null
  2730
  1087
  1082
  1339
  2020-10-10


  eef80836-e140-4adc-9598-8b612ab1825b
  TP_s
  null
  1835
  998
  956
  1832
  2020-10-10


  fd3ff8c7-0994-40b6-abe0-915368ab9ae5
  DKSnr4
  null
  580
  268
  705
  703
  2020-10-10


  3fa34100-d688-4051-a687-ec49d05e7312
  renok
  null
  119
  6
  0
  0
  2020-10-12


  bab9ceb9-2770-49ea-8489-77e5d763a224
  Lydia_C
  test user2
  516
  1301
  1852
  1687
  2020-10-12


  4649077a-9188-4821-a1ec-3b38608ea44B
  Kingston_Sav
  null
  2737
  1090
  1084
  1342
  2020-10-12


  eef80836-e140-4adc-9598-8b612ae1835c
  TP_s
  null
  1833
  998
  957
  1837
  2020-10-12


  fd3ff8c7-0994-40b6-abe0-915368ab7ab5
  DKSnr4
  null
  570
  268
  700
  703
  2020-10-12

I intend to get the biggest difference between the most recent date and the next most recent date for each unique username and the find the username with the largest margin (difference) for example..In the above table the most recent date is 2020-10-12 and the next most recent date is 2020-10-10.

So I want to get something like this

 id                                   | username     | person | tweets | followers | following | likes | created_at | prev_followers | gain
 :----------------------------------- | :----------- | :----- | -----: | --------: | --------: | ----: | :--------- | -------------: | ---:
 4649077a-9188-4821-a1ec-3b38608ea44a | Kingston_Sav | null   |   2737 |      1090 |      1084 |  1342 | 2020-10-12 |           1087 |    3

Aside: your column id varchar(36) looks like it could (better) be id uuid PRIMARY KEY. See: https://dba.stackexchange.com/a/89433/3684, https://dba.stackexchange.com/a/115316/3684 — Erwin Brandstetter, Oct 31 '20 at 03:59

Akina · Answer 1 · 2020-10-30T10:24:45.047

WITH 
cte1 AS ( SELECT DISTINCT created_at
          FROM t
          ORDER BY 1 DESC LIMIT 2 ),
cte2 AS ( SELECT src.*, 
                 LEAD(src.followers) OVER (PARTITION BY src.id 
                                           ORDER BY src.created_at DESC) prev_followers,
                 src.followers - LEAD(src.followers) OVER (PARTITION BY src.id 
                                                           ORDER BY src.created_at DESC) gain
          FROM t src
          JOIN cte1 ON src.created_at = cte1.created_at )
SELECT *
FROM cte2
WHERE gain IS NOT NULL
ORDER BY gain DESC LIMIT 1;

fiddle

whenever there are two records with the same difference margin..It always shows one instead of two.

WITH 
cte1 AS ( SELECT DISTINCT created_at
          FROM t
          ORDER BY 1 DESC LIMIT 2 ),
cte2 AS ( SELECT src.*, 
                 LEAD(src.followers) OVER (PARTITION BY src.id 
                                           ORDER BY src.created_at DESC) prev_followers,
                 src.followers - LEAD(src.followers) OVER (PARTITION BY src.id 
                                                           ORDER BY src.created_at DESC) gain
          FROM t src
          JOIN cte1 ON src.created_at = cte1.created_at ),
cte3 AS ( SELECT *, RANK() OVER (ORDER BY gain DESC) rnk
          FROM cte2
          WHERE gain IS NOT NULL )
SELECT *
FROM cte3
WHERE rnk = 1;

fiddle

Hello @Akina that is not what I want. When I run your solution this is what I get https://dbfiddle.uk/?rdbms=postgres_13&fiddle=ae0976eafa089337fb43ac9e4e551fb3 — basebandit, Oct 29 '20 at 12:02
thank you it now works as expected however whenever there are two records with the same difference margin..It always shows one instead of two. Had to change limit to 2 for such scenarios. — basebandit, Oct 30 '20 at 10:18
@basebandit Had to change limit to 2 for such scenarios. ??? and what if there are 3 rows to be returned? Updated, look. — Akina, Oct 30 '20 at 10:25
It now takes into account such scenarios. Both your answers are correct. — basebandit, Oct 30 '20 at 10:30

Erwin Brandstetter · Accepted Answer · 2020-10-29T13:59:12.497

Many ways lead to Rome. The below should be a good one (fast and flexible) to "find the username with the largest margin"

Assuming all involved columns are defined NOT NULL. And each username can only have one entry per day. Else you have to do more.

WITH cte AS (
   SELECT *, dense_rank() OVER (ORDER BY created_at DESC) AS rnk
   FROM   tbl
   )
SELECT d1.*
     , d2.followers AS prev_followers
     , d1.followers - d2.followers AS gain
FROM  (SELECT * FROM cte WHERE rnk = 1) d1
JOIN  (SELECT * FROM cte WHERE rnk = 2) d2 USING (username)
ORDER  BY gain DESC
        , d1.followers, username  -- added tiebreaker
LIMIT  1;

The CTE named cte attaches rank numbers with dense_rank() (not rank(), not row_number()). Then join the latest day (rnk = 1) with the one before (rnk = 2) and calculate the gain. Obviously, users must have entries for both days to qualify. Finally order by the gain and take the first row.

Note the added ORDER BY expressions to try and break possible ties: there can be multiple users with the same gain, so you have to define how to deal with that. One way is to add tiebreakes. In my example, a user with a smaller absolute number of followers is preferred (higher relative gain), and the alphabetically first wins if that's still ambiguous.

Or you return all "winners":

Again, many ways ... Postgres 13 added the standard SQL clause WITH TIES for that purpose exactly:

WITH cte AS (
   SELECT *, dense_rank() OVER (ORDER BY created_at DESC) AS rnk
   FROM   tbl
   )
SELECT d1.*
     , d2.followers AS prev_followers
     , d1.followers - d2.followers AS gain
FROM  (SELECT * FROM cte WHERE rnk = 1) d1
JOIN  (SELECT * FROM cte WHERE rnk = 2) d2 USING (username)
ORDER  BY gain DESC
FETCH  FIRST 1 ROWS WITH TIES;

db<>fiddle here

Detailed explanation for WITH TIES:

Greater than or equal to ALL() and equal to MAX() speed

This is a very good answer and it works but some columns for example person can be null and is indeed expected to be null for some specific usecase. I picked your answer over @akina's answer because for cases where there is a tie i.e there are two records with the same highest gain, your answer will show the both of them. — basebandit, Oct 30 '20 at 10:18
Here's a fiddle comparing both of your given solutions. https://dbfiddle.uk/?rdbms=postgres_13&fiddle=daadb4deea204dfbee517cfd8d9538cb — basebandit, Oct 30 '20 at 10:23
person can be NULL, it's just payload. If one of the columns used to sort or join can be NULL, you need to understand NULL and may need to do more. See: https://stackoverflow.com/a/9511492/939860. Please remember to provide relevant table definitions showing data types and constraints (CREATE TABLE statement) in questions. A fiddle serves well, too. — Erwin Brandstetter, Oct 30 '20 at 12:12

id	username	person	tweets	followers	following	likes	created_at
3fa34100-d688-4051-a687-ec49d05e7212	renok	null	110	6	0	0	2020-10-10
bab9ceb9-2770-49ea-8489-77e5d763a223	Lydia_C	test user2	515	1301	1852	1677	2020-10-10
4649077a-9188-4821-a1ec-3b38608ea44a	Kingston_Sav	null	2730	1087	1082	1339	2020-10-10
eef80836-e140-4adc-9598-8b612ab1825b	TP_s	null	1835	998	956	1832	2020-10-10
fd3ff8c7-0994-40b6-abe0-915368ab9ae5	DKSnr4	null	580	268	705	703	2020-10-10
3fa34100-d688-4051-a687-ec49d05e7312	renok	null	119	6	0	0	2020-10-12
bab9ceb9-2770-49ea-8489-77e5d763a224	Lydia_C	test user2	516	1301	1852	1687	2020-10-12
4649077a-9188-4821-a1ec-3b38608ea44B	Kingston_Sav	null	2737	1090	1084	1342	2020-10-12
eef80836-e140-4adc-9598-8b612ae1835c	TP_s	null	1833	998	957	1837	2020-10-12
fd3ff8c7-0994-40b6-abe0-915368ab7ab5	DKSnr4	null	570	268	700	703	2020-10-12

Find the greatest difference between each unique record with different timestamps

2 Answers2