-1

I am working with the R programming language.

Based on a shapefile, I am trying to find out the driving distance between two coordinates.

First I loaded the shapefile:

library(sf)
library(rgdal)
library(sfnetworks)
library(igraph)
library(dplyr)
library(tidygraph)

Set the URL for the shapefile

url <- "https://www12.statcan.gc.ca/census-recensement/2011/geo/RNF-FRR/files-fichiers/lrnf000r22a_e.zip"

Create a temporary folder to download and extract the shapefile

temp_dir <- tempdir() temp_file <- file.path(temp_dir, "lrnf000r22a_e.zip")

Download the shapefile to the temporary folder

download.file(url, temp_file)

Extract the shapefile from the downloaded zip file

unzip(temp_file, exdir = temp_dir)

Read the shapefile using the rgdal package

source: https://gis.stackexchange.com/questions/456748/strategies-for-working-with-large-shapefiles/456798#456798

a = st_read(file.path(temp_dir, "lrnf000r22a_e.shp"), query="select * from lrnf000r22a_e where PRUID_R ='35'")

The shapefile looks something like this:

Simple feature collection with 570706 features and 21 fields
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: 5963148 ymin: 665490.8 xmax: 7581671 ymax: 2212179
Projected CRS: NAD83 / Statistics Canada Lambert
First 10 features:
   OBJECTID NGD_UID       NAME TYPE  DIR AFL_VAL ATL_VAL AFR_VAL ATR_VAL CSDUID_L                           CSDNAME_L CSDTYPE_L CSDUID_R                           CSDNAME_R CSDTYPE_R PRUID_L PRNAME_L PRUID_R PRNAME_R RANK
1         4 3434819       <NA> <NA> <NA>    <NA>    <NA>    <NA>    <NA>  3526003                           Fort Erie         T  3526003                           Fort Erie         T      35  Ontario      35  Ontario    5
2         5 1358252      South LINE <NA>    <NA>    <NA>    <NA>    <NA>  3551027                Gordon/Barrie Island        MU  3551027                Gordon/Barrie Island        MU      35  Ontario      35  Ontario    5
3         8 1778927       <NA> <NA> <NA>    <NA>    <NA>    <NA>    <NA>  3512054                           Wollaston        TP  3512054                           Wollaston        TP      35  Ontario      35  Ontario    5
4        11  731932     Albion   RD <NA>    <NA>    <NA>    2010    2010  3520005                             Toronto         C  3520005                             Toronto         C      35  Ontario      35  Ontario    3
5        18 3123617  County 41   RD <NA>     640     708     635     709  3511015                     Greater Napanee         T  3511015                     Greater Napanee         T      35  Ontario      35  Ontario    3
6        20 4814160       <NA> <NA> <NA>    <NA>    <NA>    <NA>    <NA>  3553005     Greater Sudbury / Grand Sudbury        CV  3553005     Greater Sudbury / Grand Sudbury        CV      35  Ontario      35  Ontario    5
7        21 1817031       <NA> <NA> <NA>    <NA>    <NA>    <NA>    <NA>  3537028                         Amherstburg         T  3537028                         Amherstburg         T      35  Ontario      35  Ontario    5
8        24 4825761       <NA> <NA> <NA>    <NA>    <NA>    <NA>    <NA>  3554094 Timiskaming, Unorganized, West Part        NO  3554094 Timiskaming, Unorganized, West Part        NO      35  Ontario      35  Ontario    5
9        25  544891     Dunelm   DR <NA>       1       9       2      10  3526053                      St. Catharines        CY  3526053                      St. Catharines        CY      35  Ontario      35  Ontario    5
10       28 1835384 Seven Oaks   DR <NA>     730     974     731     975  3515005             Otonabee-South Monaghan        TP  3515005             Otonabee-South Monaghan        TP      35  Ontario      35  Ontario    5
   CLASS                 _ogr_geometry_
1     23 LINESTRING (7269806 859183,...
2     23 LINESTRING (6921247 1133452...
3     23 LINESTRING (7320857 1089403..

Then I tried to calculate the distance by treating the road network as a graph:

# Convert the shapefile to an sfnetwork object
net <- as_sfnetwork(a)

Define the two points of interest in WGS84 (EPSG:4326)

pt1 <- st_point(c(-79.61203, 43.68312)) pt2 <- st_point(c(-79.61203, 43.64256))

Set the CRS of the points to WGS84 (EPSG:4326)

pt1 <- st_sfc(pt1, crs = 4326) pt2 <- st_sfc(pt2, crs = 4326)

Transform the points to the CRS of the network

pt1_transformed <- st_transform(pt1, st_crs(net)) pt2_transformed <- st_transform(pt2, st_crs(net))

Find the nearest points on the network to the transformed points of interest

n1 <- st_nearest_feature(pt1_transformed, net) n2 <- st_nearest_feature(pt2_transformed, net)

Everything seems to work until the last part:

# Calculate the shortest path between n1 and n2 on the network
path <- net %>%
    activate(edges) %>%
    mutate(weight = edge_length()) %>%
    as_tbl_graph() %>%
    shortest_paths(n1, n2) %>%
    .$epath[[1]]

Calculate the distance of the shortest path in meters

distance <- sum(path$weight)

Print the distance in kilometers

distance_km <- distance / 1000 cat("The network distance between CN Tower and Toronto Airport is", distance_km, "km.")

But this gave me the following error:

Error in .[[.$epath, 1]] : incorrect number of subscripts
In addition: Warning message:
In shortest_paths(., n1, n2) :
  At structural_properties.c:4745 :Couldn't reach some vertices

How can I fix this?

Note: I am specifically trying to solve this problem WITHOUT using OSRM

Vince
  • 20,017
  • 15
  • 45
  • 64
stats_noob
  • 145
  • 12

1 Answers1

2

tldr; specify output="both" and extract elements by name without using pipes and dots.

This code:

shortest_paths(n1, n2) %>%
    .$epath[[1]]

appears to be some twisted way of getting sp$epath[[1]] where sp is the result of a shortest_paths function. Let's see what we get from a minimal example to see if I can reproduce that error:

 g <- make_ring(10)
 sp = shortest_paths(g, 5)

What's the $epath component?

> sp$epath[[1]]
NULL

NULL. Is that what creates this odd error in your case?

> sp %>% .$epath[[1]]
Error in .[[.$epath, 1]] : incorrect number of subscripts

Quite possibly. Oh well, bad error message is bad. How to fix it?

Reading the help for shortest_paths inputs:

 output: Character scalar, defines how to report the shortest paths.
          “vpath” means that the vertices along the paths are reported,
          this form was used prior to igraph version 0.6. “epath” means
          that the edges along the paths are reported. “both” means
          that both forms are returned, in a named list with components
          “vpath” and “epath”.

and its return value list description:

  epath: This is a list similar to ‘vpath’, but the vectors of the
          list contain the edge ids along the shortest paths, instead
          of the vertex ids. This entry is set to ‘NULL’ if it is not
          requested in the ‘output’ argument.

So lets try getting "both" outputs:

  g <- make_ring(10)
  sp = shortest_paths(g, 5, output="both")

Do we get anything in the $epath component:

> sp$epath[[1]]
+ 4/10 edges from bfda679:
[1] 4--5 3--4 2--3 1--2

Yes, and can we extract it using this pipe notation?

> sp %>% .$epath[[1]]
Error in .[[.$epath, 1]] : incorrect number of subscripts

No. No idea why, chances are something changed in dplyr or tidygraph since whatever source you copy-pasted your code from was created, assuming that worked in the first place, or some other version discrepancy. If you avoid these things your code will be more stable.

If you want to write the code so that you can inspect the state of things at any error point, use this syntax, or better still give the tmp variables more meaningful names:

tmp <- activate(net, edges)
tmp <- mutate(tmp, weight = edge_length())
tmp <- as_tbl_graph(tmp)
tmp <- shortest_paths(tmp, n1, n2, output="both") 
path <- tmp$epath[[1]]

Whichever line this stops at you can inspect tmp at that point and see if, for example, there's any actual graph nodes in the graph, or if there's any shortest paths been found and so on.

Spacedman
  • 63,755
  • 5
  • 81
  • 115
  • @ Spacedman: Thank you so much for your answer! Based on my understanding of your answer - I should replace shortest_paths(n1, n2) with : shortest_paths(n1, n2, output = "both")? – stats_noob Jun 19 '23 at 14:45
  • I was able to get this far by removing the shortest path command: path <- net %>% activate(edges) %>% mutate(weight = edge_length()) %>% as_tbl_graph() – stats_noob Jun 19 '23 at 15:16
  • I've rewritten your pipe chain as a debuggable sequence of statements. – Spacedman Jun 19 '23 at 15:57
  • I am now getting this output: Warning message: In shortest_paths(tmp, n1, n2, output = "both") : At core/paths/dijkstra.c:446 : Couldn't reach some vertices. – stats_noob Jun 19 '23 at 16:04
  • That's a warning. It means there are some vertices that it can't reach. If there's a path connecting your start and end vertices it will find it though. Its just there are some points on the network that are not connected to the rest of network. – Spacedman Jun 19 '23 at 16:07
  • One of the points is the Toronto Airport and the other is the CN Tower (Toronto) - from experience I know it is possible to drive from one of these locations to the other. Maybe I have entered the coordinates backwards? pt1 <- st_point(c( 43.68312, -79.61203)) pt2 <- st_point(c(43.64256, -79.61203)) – stats_noob Jun 19 '23 at 16:07
  • Sorry, I've been trying to do this without downloading your 300Mb road network file, because there were some clear problems with your code (not having output="both" and expecting to find a $epath component being the clearest). I'll try with that road network file but its just failed at 41Mb... – Spacedman Jun 19 '23 at 16:11
  • @ Spacedman: I really appreciate all your help and time for teaching me about geospatial analysis - thank you so much... – stats_noob Jun 19 '23 at 16:17