4

I'm looking for the equivalent code in R to this post in python for adding a column that cumulative counts the number of positives and negative values in the preceeding column.

I've found many examples of cumulative sums or something more complex, but I would just like to count the number of positives and negatives in a row that resets whenever the sign changes. See sample code.

library(dplyr)
df <- data.frame(x = c(0.5, 1, 6.5, -2, 3, -0.2, -1))

My expected output is this:

df <- data.frame(x = c(0.5, 1, 6.5, -2, 3, -0.2, -1), 
                 z = c(1,2,3,-1,1,-1,-2))

I would like R to create column "z" with a mutate function to the dataframe df when it starts with just "x".

Jcarl
  • 161
  • 2
  • 9
  • Related: [Create counter within consecutive runs of values](https://stackoverflow.com/questions/19998836/create-counter-within-consecutive-runs-of-values); e.g. `sn = sign(x); rowid(rleid(sn)) * sn` – Henrik Nov 03 '21 at 01:27

1 Answers1

5

You can try:

library(dplyr)

df %>%
  mutate(z = with(rle(sign(x)), sequence(lengths) * rep(values, lengths)))

     x  z
1  0.5  1
2  1.0  2
3  6.5  3
4 -2.0 -1
5  3.0  1
6 -0.2 -1
7 -1.0 -2

You may want to consider how zeroes should be treated as the above may need a modification if zeroes exist in your vector. Perhaps:

df %>%
   mutate(z = with(rle(sign(x)), sequence(lengths) * rep(values^(values != 0), lengths)))

Edit addressing OP comment below:

df %>%
  mutate(z = with(tmp <- rle(sign(x)), sequence(lengths) * rep(values, lengths)),
         id = with(tmp, rep(seq_along(lengths), lengths))) %>%
  group_by(id) %>%
  mutate(avg = cumsum(x)/row_number()) %>%
  ungroup() %>%
  select(-id)

# A tibble: 7 x 3
      x     z   avg
  <dbl> <dbl> <dbl>
1   0.5     1  0.5 
2   1       2  0.75
3   6.5     3  2.67
4  -2      -1 -2   
5   3       1  3   
6  -0.2    -1 -0.2 
7  -1      -2 -0.6 
Ritchie Sacramento
  • 29,890
  • 4
  • 48
  • 56
  • The solution above worked perfectly. I have a follow up question. How would you add an additional column to the right of 'z' that did a cumulative average of all of the previous numbers of the same sign? So if the column names was "avg", row 1 would be equal to 0.5, row 2 = 0.75, row 3 = 2.67, row 4 = -2, row 5 = 3, row 6 = -0.2, row 7 = -0.6. – Jcarl Nov 09 '21 at 23:42
  • @Jcarl - see edit. – Ritchie Sacramento Nov 09 '21 at 23:57
  • This is great!!!!! Thanks so much! – Jcarl Nov 10 '21 at 00:28