14

I have a dataframe with around 40 points. They are grouped by an ID variable (their name), which four points in each group. I'm trying to turn these points into polygons, but I've had a lot of trouble with different methods in both sf and sp — currently I have them as a large sf dataframe, but I can't seem to find any function that would use a grouping variable to create polygons.

I've tried using the st_cast method, the general idea being

polypoints <- polypoints %>% 
    group_by("ID") %>% 
    st_cast("MULTIPOINT") %>% 
    st_cast("MULTILINESTRING") %>% 
    st_cast("MULTIPOLYGON")

but the step from multipoint to multiline string leaves the geography empty — what are the steps for a process like this?

Matt
  • 16,843
  • 3
  • 21
  • 52
cschwab98
  • 185
  • 1
  • 2
  • 8

4 Answers4

17

You need to summarise after your group_by statement then your approach works perfectly fine. Directly from POINTS --> POLYGON, and keeping the crs (if there is one).

set.seed(999)
xy = data.frame(x=runif(24), y=runif(24))
xy$ID = rep(1:6, each = 4)
head(xy)

xys = st_as_sf(xy, coords=c("x","y"))

polys = xys %>% dplyr::group_by(ID) %>% dplyr::summarise() %>% st_cast("POLYGON")

plot(polys)

enter image description here

If you want the outer polygon you can add st_convex_hull()

polys = polys %>% 
  st_convex_hull()

plot(polys)

enter image description here

Josh O'Brien
  • 927
  • 5
  • 15
Latrunculia
  • 271
  • 2
  • 3
  • As mentioned in several posts, you can use aggregate() to achieve the same result as summarise(), however you may still need to use st_convex_hull() to make geoms valid. If you have a valid point order you need to maintain @Jean-Luc Dupouey noted that you need to include do_union = FALSE in the aggregate() call, but it's also worth noting this is an option in summarise(..., do_union = FALSE) (which becomes summarise.sf() in this context). – Will Hore-Lacy Nov 15 '23 at 23:24
6

Make a reproducible example:

> set.seed(999)
> xy = data.frame(x=runif(24), y=runif(24))
> xy$ID = rep(c(1:6), rep(4,6))
> head(xy)
           x         y ID
1 0.38907138 0.8260703  1
2 0.58306072 0.8195141  1
3 0.09466569 0.5684927  1
4 0.85263123 0.6196068  1
5 0.78674676 0.8308805  2
6 0.11934226 0.4588336  2

Make the data frame into an sf data frame:

> xys = st_as_sf(xy, coords=c("x","y"))

Then aggregate by ID, combine the points and cast to POLYGON, turn the whole thing into an sf data frame. In a one-liner:

> polys = st_sf(
  aggregate(
    xys$geometry,
    list(xys$ID),
    function(g){
       st_cast(st_combine(g),"POLYGON")
    }
   ))

> polys
Simple feature collection with 6 features and 1 field
geometry type:  POLYGON
dimension:      XY
bbox:           xmin: 0.03014558 ymin: 0.01902308 xmax: 0.9875201 ymax: 0.8690149
epsg (SRID):    NA
proj4string:    NA
  Group.1                       geometry
1       1 POLYGON ((0.3890714 0.82607...
2       2 POLYGON ((0.7867468 0.83088...
3       3 POLYGON ((0.3907724 0.52724...
4       4 POLYGON ((0.03014558 0.8162...
5       5 POLYGON ((0.1665847 0.10961...
6       6 POLYGON ((0.9074913 0.35951...

Looks pretty awful but that's random data for you...

enter image description here

Spacedman
  • 63,755
  • 5
  • 81
  • 115
  • 1
    thank you so much, this is extremely helpful (both in solving the problem and in teaching me new ways to think about gis in R) - I have a few quick questions though. Where is the xys dataframe coming from? Also, trying to use this on my own data returns: Error in st_sfc(NextMethod(), crs = st_crs(x), precision = st_precision(x)): is.numeric(crs) || is.character(crs) || inherits(crs, "crs") is not TRUE . Not sure what is causing this, I tried including st_crs in the polys line but that didn't help either. – cschwab98 Aug 19 '19 at 20:22
  • Just to add on to the error, using my own data won't even display the object using view("polys"), and says r error 4 (R code execution error) despite compiling properly – cschwab98 Aug 19 '19 at 20:26
  • Oops missing line - xys is an sf data frame created from xy - edited in. – Spacedman Aug 19 '19 at 20:29
  • yes, figured that one out! still stuck on crs issue, which doesn't happen in sample issue. Not sure what's happening, tried not including crs at st_as_sf step but that doesn't help – cschwab98 Aug 19 '19 at 20:35
  • Can't duplicate. If I st_crs(xys)=4326 and then run it it works, I can even add ,crs=4326) in the wrapping st_sf( call.... – Spacedman Aug 19 '19 at 20:43
  • I tried this with some different reproducible data (can't post anything real I'm working with due to some privacy concerns) and it seems like this must be some projection issue with the data itself and some weird irregularities (working with public data, oh well). Nonetheless I really appreciate your help!! – cschwab98 Aug 19 '19 at 20:55
0

The answer provided by @Spacedman doesn't work any in newer versions of sf. I think the issue is that aggregate automatically handles the union of the geometries and the FUN argument is intended to handle aggregating the non-geometry columns? I also think st_combine produces an sfc output, so the geometry column ends up as a list of sfc objects, not a single sfc object?

The following approach works in sf 1.0-1:

set.seed(999)
xy = data.frame(x=runif(24), y=runif(24))
xy$ID = rep(c(1:6), rep(4,6))

xys = st_as_sf(xy, coords=c("x","y"))

Aggregate - which unions the points into MULTIPOINTS (default do_union=TRUE)

and uses FUN to assign the first value in other fields

xymp = st_sf( aggregate( xys, by=list(ID=xys$ID), FUN=function(vals){vals[1]}))

Cast to poly

xypoly = st_cast(xymp, 'POLYGON')

.or (more usefully?) convex hull

xychull <- xymp st_geometry(xychull) <- st_convex_hull(xymp$geometry)

plot(xychull['ID'])

Convex hulls of points

David_O
  • 101
0

The code below shows that the solution proposed by David_O does not work properly (i.e. does not keep the order of the points provided in the dataframe):

library(sf)

xy = data.frame(x = c(0,1,1,0,0,1,2,2,1,1), y = c(0,0,1,1,0,0,0,1,1,0), ID=c(rep(1,5),rep(2,5)))

xys = st_as_sf(xy, coords=c("x","y"))

Aggregate - which unions the points into MULTIPOINTS (default do_union=TRUE)

and uses FUN to assign the first value in other fields

xymp = st_sf( aggregate( xys, by=list(ID=xys$ID), FUN=function(vals){vals[1]}))

Cast to poly

xypoly = st_cast(xymp, 'POLYGON')

plot(xypoly['ID'])

enter image description here

In order for this solution to work, one has to add the option do_union = FALSE in the call to aggregate:

library(sf)

xy = data.frame(x = c(0,1,1,0,0,1,2,2,1,1), y = c(0,0,1,1,0,0,0,1,1,0), ID=c(rep(1,5),rep(2,5)))

xys = st_as_sf(xy, coords=c("x","y"))

Aggregate - which unions the points into MULTIPOINTS (default do_union=TRUE)

and uses FUN to assign the first value in other fields

xymp = st_sf( aggregate( xys, by=list(ID=xys$ID), do_union=FALSE, FUN=function(vals){vals[1]}))

Cast to poly

xypoly = st_cast(xymp, 'POLYGON')

plot(xypoly['ID'])

enter image description here

It seems to be because the default option of aggregate (do_union = TRUE) calls st_union, and that st_union sorts its result by y.

  • This code works great but how does "function(vals){vals[1]}" work? I can't find documentation for vals anywhere -- is it just defined with the anonymous function? – Mark R Nov 14 '23 at 17:08
  • 1
    Yes, "vals" is just the formal argument taken by the anonymous function. It could be any other name. And "vals[1]" returns the first row of values in each of the aggregated groups. – Jean-Luc Dupouey Nov 15 '23 at 21:13