How to delimit hinterland boundaries among multiple city centers?

Question

enter image description here I need to delimit city boundaries basing on centers'magnitude. CenterA's employment/CenterB's employment = (Distance from centerA to break-point E/Distance from centerB to break-point E)^2. AB=AE+BE.

I have multiple centers. I need to create a buffer-like irregular boundary to delimit adjacent cities. And how do i illustrate the influence among non-adjacent cities?

Is there any tools that I can use in ArcGIS 10 to do this?

I think this needs a picture so I have upvoted to try and give you enough reputation to post one. Alternatively, you should be able to post a picture elsewhere and just edit a link to that into your question. — PolyGeo, Jun 30 '13 at 01:54
Perhaps a duplicate with this: http://gis.stackexchange.com/questions/17282/how-can-i-create-weighted-thiessen-polygons ? — Kirk Kuykendall, Jun 30 '13 at 02:58
@Kirk Sort of, but not really. This is Reilly's Law of Retail Gravitation. It's a kind of generalized Voronoi tessellation. Apple, how are you computing distance? In applications it usually is travel time rather than Euclidean distance, but sometimes (as a simplification) Euclidean distance is used instead. (Incidentally, for a thoroughly wrong exposition of the meaning of this law see the Wikipedia article. Reilly in his 1929 monograph anticipated the errors in that exposition and explicitly refuted them.) — whuber, Jun 30 '13 at 14:11
(Another incorrect exposition is at http://srufaculty.sru.edu/james.hughes/100/100-6/reilly1.htm. The page at http://geography.about.com/cs/citiesurbangeo/a/aa041403a.htm gets the interpretation right. The formula given there is algebraically equivalent to that quoted in this question.) — whuber, Jun 30 '13 at 14:17
Thanks guys, the comments are encouraging. PolyGeo, I uploaded the picture. Kirk, I am trying to use weighted tessellation polygon...it is something similar. whuber, Yes, the formula origins from Reilly's law. I am using it to delimit areas for workers commuting to different employment centers in a metro. Do you think it is a valid application? I simply use Euclidean distance, although commuting cost would be better :) — Apple, Jun 30 '13 at 19:16
Do you have access to Spatial Analyst or any other raster-based software? BTW, Reilly's Law has some modern versions, such as Huff models. The inverse square in Reilly's formula wasn't really justified by his data, but he liked the parallel with Newton's Law. As a crude approximation for mapping rough areas of "influence" it can work, but even Reilly was clear that this formula does not "delimit" anything. He used it to describe points where, among all people shopping at cities A and B, half would choose A and half would choose B. Many of A's customers still come from beyond such points. — whuber, Jun 30 '13 at 19:55
Yes, I have spatial analyst in ArcGIS 10 desktop. I tried diffusion interpolation, but it does not seem right. — Apple, Jun 30 '13 at 19:58
Thanks @whuber. Yes, it is for the 50 states USA. After running the following codes--modified from yours, I got "Error in matrix(0, n.rows, n.cols) : too many elements specified". I am using 32-bit RGui. 1. Is it possible to execute the codes in ArcGIS 10? Does the memory allow? 2. Or a more efficient ways is to separate the area into 4? parts -- HI, AK (because they are far away), EAST and WEST parts of USA? { x.min <- -6328997.74765339; x.max <- 2182662.25234661 # Extent of easting coordinates y.min <- 310413.438361092; y.max <- 5448183.43836109 # Extent of northing coordinates n <- 2351 cen — Apple, Jul 04 '13 at 05:29
Because this is not an answer, it will have to be deleted. But let me remark that any grid with 171259 rows and 283722 columns must contain 48 billion cells: there's no way your system could possibly handle that. (Get a 64-bit system with a few terabytes of RAM and you should be OK. :-) You need to think about what is a sensible grid resolution; it's unlikely you gain much from a 30 m cellsize compared to a 300 m cellsize. If that high resolution is important, then you must tile your study area into smaller regions for separate analysis. — whuber, Jul 04 '13 at 16:38

score 2 · Accepted Answer · edited Jun 11 '20 at 15:27

Raster calculations are quite effective for such calculations, especially when you have travel time data.

The inputs are rasters of distance to the cities, one raster per city. For example, if you're using Euclidean distance, first compute the Euclidean distance grid for each city. Otherwise, compute the "distance" in any way you can, maybe using CostDistance or perhaps rasterizing a network travel time calculation.

The area associated to a city consists of all points where that city's population, divided by the square of the distance, is larger than the comparable population/distance^2 values for all other cities. To prepare for this, use your distance grids to compute these population/distance^2 values: that is a simple "map algebra" operation. Now you're practically done: the Highest Position command creates a coded version of the result you want. Its argument is a list of your population/distance^2 grids and for the coded values, a 1 is used for the first grid in the list, 2 for the second, and so on.

Example

Comments indicate this may be a big problem: 2300 or so centers, perhaps covering a huge (country-wide?) area, thereby requiring a large grid. Don't try this with Spatial Analyst: as a general-purpose raster GIS, it's too slow.

Here is an example in R involving one-tenth as many centers on a 1000 by 1667 grid. It took ten seconds to compute. (That's still a little slow, but it's at least an order of magnitude faster than it could be done in ArcGIS 10.x.)

Map

The numbers on this map identify the centers, using whole values from 1 through 230. The black dots show the locations of those centers (with areas in proportion to their populations). Because this is based on Euclidean distance, the boundaries among "territories" are all portions of circular arcs.

The calculation proceeds by establishing two grids--one for the maximum "gravity" seen so far in each cell and another for the current "owner" of that cell--and computing a new grid for each center in turn. It identifies all cells where the new center's "gravity" exceeds the current maximum and updates the [gravity] grid with the new center's value and the [owner] grid with the identifier of the new owner.

The computation time scales directly with the number of centers, so 2300 centers would require two minutes of calculation. It also scales directly with the number of rows and the number of columns. A more detailed grid of, say, 10,000 rows by 16,667 columns would therefore require about 160 minutes, or three hours.

To achieve these moderately quick speeds, the algorithm needs RAM to store three grids at once: [gravity], [owner], and the values for each center in turn (which are re-used for each center). Assuming 16 bytes per value (double precision with some R overhead), the calculation on 10,000 by 16,667 grid would require 8 GB, which I confirmed with a test. (That would cover the entire continental US at a 300 meter resolution.) At this resolution the calculation takes 5 seconds per center. Extrapolating to 2300 centers still gives three hours.

Be prepared for counter-intuitive results. (I'm not saying they will be wrong, but they might be surprising.) For instance, when just two cities are involved, the area associated with the smaller city will be a perfect disk: but it won't be centered on the city itself. It will be surrounded an (infinite) area associated with the larger city. That means that a plurality of people living sufficiently far away on the far side of the small city are predicted to travel to the large city--even when that involves passing right through the small city.

Here is the R code used to produce the example. To use it in practice you would want to read the center coordinates (perhaps from a shapefile) and, afterwards, write the allocation grid to disk for further processing and better mapping. Those straightforward procedures are illustrated (with working code) on other R-based threads here.

#
# Create population centers and their populations.
#
set.seed(17)
x.min <- 0; x.max <- 5000000 # Extent of easting coordinates
y.min <- 0; y.max <- 3000000 # Extent of northing coordinates
n <- 230                     # Number of centers
center <- matrix(runif(n*2, min=c(x.min,y.min), max=c(x.max, y.max)), 
                 ncol=2, nrow=n, byrow=TRUE)
pop <- rgamma(n, 5, 10^-2 / (4*n)) # Total population around 110M people
#
# Create row and column coordinate arrays.
#
n.rows <- 10^3
cellsize <- (y.max - y.min) / n.rows
n.cols <- ceiling((x.max - x.min) / cellsize)
x.max <- x.min + n.cols * cellsize          # Assures square cells are used
y.0 <- seq(y.max-cellsize/2, y.min+cellsize/2, length.out=n.rows) # 
x.0 <- seq(x.min+cellsize/2, x.max-cellsize/2, length.out=n.cols)
#
# For each center, compute a "gravity" grid (based on population times inverse
# squared Euclidean distance) and accumulate the maxima.
#
system.time(
  {
    #
    # It is slightly more efficient to process the larger
    # centers first.
    #
    i <- order(pop, decreasing=TRUE)
    pop <- pop[i]
    center <- center[i, , drop=FALSE]
    # 
    # Initialize the output.
    #
    owner <- matrix(0, n.rows, n.cols)
    gravity.max <- matrix(0, n.rows, n.cols)
    #
    # Loop over the centers, updating the output.
    # (This is a "highest position" calculation.)
    #
    for (i in 1:n) {
      r <- pop[i] / outer((y.0 - center[i,2])^2, (x.0 - center[i,1])^2, "+")
      update <- which(r >= gravity.max)
      gravity.max[update] <- r[update]
      owner[update] <- i
    }  
  })
#
# Plot the results.
#
library(raster)
par(mfrow=c(1,1))
plot(raster(log(gravity.max), xmn=x.min, xmx=x.max, ymn=y.min, ymx=y.max), 
     main="Log Gravity")
plot(raster(owner, xmn=x.min, xmx=x.max, ymn=y.min, ymx=y.max), 
     main="Gravity-based allocation")
points(center, pch=19, cex = 20*sqrt(pop/n/max(pop)))

There are more than two thousand cities (employment centers) I need to work on, "one raster per city" seems too much manual work. Is there any way to avoid that? Also if cities were areas rather than points, the result would be a great improvement. — Apple, Jul 01 '13 at 17:19
The area issue is simple: the Euclidean distance map (or even a travel time map) is computed the same way for a polygon as it is for a point; no other change is needed. You're right about the need for a way to expedite the calculation with 2000 cities. It can be done, but it's tricky: you can include multiple cities per raster, provided you know beforehand they will not share any "hinterland" boundaries. That means you must first partition all cities into groups, each of which is "widely scattered" in this sense, and create one distance raster for each group. — whuber, Jul 01 '13 at 17:24
I tried cellsize=300 and switched to 64-bit system. Memory.limit() is 8148. When running the code, it still gives me an "Error: cannot allocate vector of size 3.6 Gb". Sometimes, Windows stops working--even worse... Btw, happy July 4th! You can answer my question tomorrow :) — Apple, Jul 04 '13 at 20:18
Then clearly you don't have enough RAM. In that case you need to tile your study area, compute the results for the tiles, write them to disk, and at the end (if you wish) mosaic them into a single grid. Just make sure each computation uses all the points and that they always have the same identifiers (which amounts to making sure they appear in the same order in center). But before you go to that effort, make sure everything is working correctly by doing the calculations on an extremely coarse grid--say, 100 km or so. — whuber, Jul 05 '13 at 13:07

score 0 · Answer 2 · answered Jul 01 '13 at 05:26

0

Thanks to whuber. I found there is an existing module called -- Huff's Equal Probability Trade Areas (Business Analyst) can solve the problem! http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#/How_Huff_s_Equal_Probability_Trade_Areas_Business_Analyst_works/00mm00000060000000/

answered Jul 01 '13 at 05:26

Apple

121
1
10

Although the Huff model shares many aspects of the gravity model formula, it is not the same and does not do what you have asked in your question. The Huff model purports to estimate a "probability" of travel to each center at each location. By thresholding these "probabilities," you can obtain contours that sort of surround each center, but they do not necessarily partition the study area--there can be places left out altogether. See the illustration at http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#/Huff_s_Equal_Probability_Trade_Areas/00mm0000003w000000/. – whuber Jul 01 '13 at 13:53
Yes, I am trying to change parameters to avoid gaps among centers. – Apple Jul 01 '13 at 17:21

How to delimit hinterland boundaries among multiple city centers?

2 Answers2

Example