I am trying to aggregate all of the census tracts in the United States for analysis and display, including a realistic depiction of water features when zoomed in. I run into memory issues when doing this at scale, so I am looking for a more efficient method.
Here is my current function for doing this:
library(tidyverse)
library(tigris)
library(sf)
library(rmapshaper)
get_tracts <- function(state_abbr,
year=NULL,
refresh=FALSE,
remove_water=TRUE){
print(paste0("getting shapefile for tracts in ",state_abbr))
state_tracts <- tracts(cb = TRUE,
year = year,
class="sf",
state=state_abbr,
refresh=refresh) %>%
st_transform(., crs=4326) %>% st_make_valid()
Sys.sleep(1) #included for politeness to census servers
if(!remove_water){
return(state_tracts)
}
print(paste0("fetching water boundaries for ",state_abbr))
county_codes <- unique(state_tracts$COUNTYFP)
print(paste0("joining water areas into one shapefile for ",state_abbr))
https://community.rstudio.com/t/better-ways-to-remove-areas-of-water-from-us-map/60771
https://gis.stackexchange.com/questions/308119/function-in-r-to-get-difference-between-two-sets-of-polygons-comparable-in-speed
water_area <-
map_dfr(
county_codes,
~ area_water(state_abbr,
county = .,
year = year,
refresh = refresh,
class = "sf") %>%
select("geometry") %>%
st_transform(., crs=4326) %>%
st_make_valid()
)
print("subtracting water areas from census tract areas")
state_tracts_sans_water <- ms_erase(ms_simplify(state_tracts),
ms_simplify(water_area),
remove_slivers=TRUE,
sys=FALSE)
Sys.sleep(1)
return(state_tracts_sans_water)
}
This works fine for a state such as Michigan:
mi_tracts <- get_tracts("MI",year=2018,remove_water = FALSE)
mi_tracts %>%
ggplot(aes(fill = STATEFP)) +
geom_sf()
mi_tracts_sans_water <- get_tracts("MI",year=2018)
mi_tracts_sans_water %>%
ggplot(aes(fill = STATEFP)) +
geom_sf()
However, for a large state like Texas (the state with the most counties), it fails by crashing my R session:
tx_tracts_sans_water <- get_tracts("TX",year=2018)
Sometimes it simply crashes the R session in RStudio, and when run as part of a loop of many states I may receive the following error:
V8 FATAL ERROR in Ineffective mark-compacts near heap limit: Allocation failed - JavaScript heap out of memory # # Fatal error in , line 0 # API fatal error handler returned after process out of memory # # # #FailureMessage Object: 0x7ffe249dd920 ==== C stack trace ===============================
Things I have thought about or tried:
- Upgraded from standard
sffunctions tormapshaperper this question/answer, which improved performance to the extent that I could even have the aforementioned issue. - Was unable to figure out how to use the system
mapshaperusing thesys=TRUEoption ofms_eraseon my RStudio Server setup, and question whether this would be sustainable given the need to definePATHspecific to each user and version of node in use. - I am open to parallelization but due to the functions being performed I am not sure that this will be productive.
- Batching the logic by county rather than by state. I am asking the question here before refactoring my code to do this, but if this is the only way or there are other pro's to this that make it preferable then I am open to it.
- The answer to @mrblobby's RStudio Community post is to directly download the relevant hydrology shapefiles from the census website. However, this is not reproducible/automated. Automated ways to do this that can be parametrized by state are welcome.
- To allow for zooming into specific communities at a high resolution, I would prefer not to filter out bodies of water based on their size (
AWATER). However, if there is a standard that exists for such filtering then I am open to doing this.
So, with all of that being said, how should I go about removing areas of water from the US census tract map using R/RStudio?





printbut rather something likemessage("getting data for tracts in ", state_abbr). This also negates the use ofpaste. Also, when thinking aboutprintconsidercatas an alternative. This becomes quite relevant when submitting packages to CRAN. The package reviewer will catch inapproapriace function usage and request that you change it before the package is approved. ALso always invoke the NAMESPACE of functions used in your function (eg.,tigris::tracts) – Jeffrey Evans Feb 22 '21 at 15:32