2

Currently working on getting to know dplyr and the whole tidyverse better and now I stumbled across multiple ways on storing the results of a mutate call. I wonder if one of the possible of ways to add the additional column is better or worse.

library(data.table)
library(dplyr)
dt <- structure(list(obs = c("1953M04", "1953M05", "1953M06", "1953M07", "1953M08", "1953M09", "1953M10", "1953M11", "1953M12", "1954M01")
               , gs1 = c(2.35999989509583, 2.48000001907349, 2.45000004768372, 2.38000011444092, 2.27999997138977, 2.20000004768372, 1.78999996185303, 
           1.66999995708466, 1.6599999666214, 1.4099999666214)), row.names = c(NA, -10L), class = c("data.table", "data.frame"))

# Data.Table approach
dt[, Date.Month := as.Date(paste0(obs,"-01"), format = "%YM%m-%d")]

# dplyr-way in a logic way at the end of the pipe
dt %>% mutate( Date.Month = as.Date(paste0(obs,"-01"), format = "%YM%m-%d")) %>% {. ->> dt }

# Direct reassignment, but it's kind of illogic to assign on the left the output from the right, at least in my head ;-)
dt <- dt %>% mutate( Date.Month = as.Date(paste0(obs,"-01"), format = "%YM%m-%d"))

Is the reassignment in the last version costly in terms of computational effort or not?

hannes101
  • 2,410
  • 1
  • 17
  • 40
  • R objects are copied when modified anyway: https://stackoverflow.com/questions/15759117/what-exactly-is-copy-on-modify-semantics-in-r-and-where-is-the-canonical-source . If you are concerned about speed run some tests – Jack Brookes Mar 01 '19 at 14:31
  • Assigning at the start is expected R syntax and people will be confused if you assign at the end. If you really hate assigning on the left side of a pipe, though, there is a right assignment operator `->` that you can use at the end of the pipe without the ugly `{. ->>` syntax. You can just go: `dt %>% mutate(...) -> dt`. – divibisan Mar 01 '19 at 14:40
  • 2
    @JackBrookes If I understand correctly `:=` is not doing it, because [...] It adds or updates or removes column(s) by reference. It makes no copies of any part of memory at all. [...] https://www.rdocumentation.org/packages/data.table/versions/1.12.0/topics/%3A%3D – hannes101 Mar 01 '19 at 14:41
  • I don't think I've seen `->>` used as part of the dplyr-way... Or part of any way, really. I haven't used it myself, but others have [advised against using it for global assignment](https://stackoverflow.com/questions/5785290/what-is-the-difference-between-assign-and-in-r). – Z.Lin Mar 01 '19 at 14:47
  • @divibisan Ah, true the advantage of this syntax is, that you can then just go on with additional operations, so basically save intermediate results. – hannes101 Mar 01 '19 at 14:47
  • 2
    Your direct reassignment is the way that was in R for last 25 years so I don't think it is illogical. You just have to get used to pipes which nest expressions on RHS – jangorecki Mar 01 '19 at 14:48

1 Answers1

5

One option would be the compound assignment operator (%<>%) operator from magrittr

library(magrittr)
library(dplyr)
dt %<>% 
    mutate( Date.Month = as.Date(paste0(obs,"-01"), format = "%YM%m-%d"))

However, the data.table assignment operator (:=) would be faster and efficient

akrun
  • 874,273
  • 37
  • 540
  • 662