Assigning line number within group using dplyr and data.table

Question

I need to create a new column which holds a line number within group.

Some data to work with:

> set.seed(222)
> dt <- diamonds %>% 
  select(cut, color, price) %>% 
  rename(riding=cut,party=color,votes=price) %>% 
  group_by(riding) %>% sample_n(3) %>% 
  distinct(riding,party,.keep_all = TRUE) %>%
  arrange(riding, desc(votes) ) %>% data.table %T>% print 

       riding party votes
        <ord> <ord> <int>
1:      Fair     H  3658
2:      Fair     G  2808
3:      Good     E  2542
4:      Good     D   684
5: Very Good     G  7974
6: Very Good     F  1637
7: Very Good     D   447
8:   Premium     H  5458
9:   Premium     F  2469
10:   Premium     D  1892
11:     Ideal     F 10786
12:     Ideal     E  4832
13:     Ideal     G   757

So the desired output should be like this:

       riding party votes place
       <ord> <ord> <int>  <int>
1:      Fair     H  3658   1
2:      Fair     G  2808   2
3:      Good     E  2542   1
4:      Good     D   684   2
5: Very Good     G  7974   1
6: Very Good     F  1637   2
7: Very Good     D   447   3
8:   Premium     H  5458   1
9:   Premium     F  2469   2
10:   Premium     D  1892  3
11:     Ideal     F 10786  1
12:     Ideal     E  4832  2
13:     Ideal     G   757  3

Please show me how to do it, using either dplyr or data.table, or both.

I thought the following will work, but it didn't . Anyone knows why? It gives global row n instead. Can I use .I with by?

> dt2[ order(votes), place:=.I, by=riding][]     # does not work
riding party votes place
<ord> <ord> <int> <int>
1:      Fair     H  3658     1
2:      Fair     G  2808     2
3:      Good     E  2542     3
4:      Good     D   684     4
5: Very Good     G  7974     5
6: Very Good     F  1637     6
7: Very Good     D   447     7
8:   Premium     H  5458     8
9:   Premium     F  2469     9
10:   Premium     D  1892    10
11:     Ideal     F 10786    11
12:     Ideal     E  4832    12
13:     Ideal     G   757    13

After getting the data set, `dt %>% group_by(riding) %>% mutate(place = row_number())`. — Rui Barradas, Aug 29 '20 at 15:46
There are a lot of `dplyr` verbs used to create the `data.table` object `dt`. Here is an equivalent `data.table` version: `library(data.table); set.seed(222); dt <- as.data.table(ggplot2::diamonds)[, .(riding = cut, party = color, votes = price)][, unique(.SD[sample(.N, 3L)], by = "party"), keyby = riding][order(riding, -votes)]` — Uwe, Aug 30 '20 at 09:21

score 1 · Answer 1 · answered Aug 29 '20 at 15:45

In dplyr() I would suggest using group_by() with riding and then create the new variable with a sequence starting from 1 to n():

library(dplyr)
library(data.table)
#Code
set.seed(222)
dt <- diamonds %>% 
  select(cut, color, price) %>% 
  rename(riding=cut,party=color,votes=price) %>% 
  group_by(riding) %>% sample_n(3) %>% 
  distinct(riding,party,.keep_all = TRUE) %>%
  arrange(riding, desc(votes) ) %>% data.table %>% print 

#Create id
dt %>% group_by(riding) %>% mutate(place=1:n())

Output:

# A tibble: 13 x 4
# Groups:   riding [5]
   riding    party votes place
   <ord>     <ord> <int> <int>
 1 Fair      H      3658     1
 2 Fair      G      2808     2
 3 Good      E      2542     1
 4 Good      D       684     2
 5 Very Good G      7974     1
 6 Very Good F      1637     2
 7 Very Good D       447     3
 8 Premium   H      5458     1
 9 Premium   F      2469     2
10 Premium   D      1892     3
11 Ideal     F     10786     1
12 Ideal     E      4832     2
13 Ideal     G       757     3

Or `row_number()` instead of `1:n()`. – Rui Barradas Aug 29 '20 at 15:47 — Rui Barradas, Aug 29 '20 at 15:47
@RuiBarradas It is a great suggestion too!!! Thanks Sir!!! – Duck Aug 29 '20 at 15:48 — Duck, Aug 29 '20 at 15:48

score 0 · Answer 2 · answered Aug 29 '20 at 16:44

0

Similar for data.table. .I doesn't count separately for each group so you have to use 1:.N or an equivalent:

dt2[ order(votes), place := 1:.N, by=riding][]

answered Aug 29 '20 at 16:44

pseudospin

2,737
1
4
19

score 0 · Answer 3 · answered Aug 29 '20 at 18:50

0

We can also use setorder

library(data.table)
setorder(dt, votes)[, place := seq_len(.N), by = riding][]

answered Aug 29 '20 at 18:50

akrun

874,273
37
540
662

Assigning line number within group using dplyr and data.table

3 Answers3