2

TL;DR: I'm trying to calculate the forested area of nature reserves. For this I have two partially overlapping datasets (reserves and forests). My approach is a union and consecutive spatial join of conntaining forest areas, that fails because forests that share a boundary with the reserve are not recognized.

A) Is this due to the definition of "contains"?

B) If yes, isn't this definition of "contains" flawed or should at least be made transparent in the docs?

C) Are there alternatives (in QGIS) to join on polygons that interior but with touching borders?

I uploaded test datasets: reserves.gpkg and forest.gpkg and reserve_union_forest.gpkg (as suggested in a comment).


In detail: To calculate the forested area of nature reserves my approach was

  1. Union reserves and forests => waldreservate_union_wald.gpkg
    • Delete non matching polygons (areas that are outside reserves or that are not forest)
    • Calculate area of remaining polygons
    • => waldreservate_union_wald.gpkg clean (only forests within reserves) with attribute forest_area_ha
  2. Calculate total forest area within each reserve: Join attributes by location (summary) with contains and calculate summary sum on attribute forest_area_ha
    • => waldreservate_forest_area_ha.gpkg with attribute forest_area_ha_sum

enter image description here

This however gives me a lot of NULL values, since all forest polygons that share the border with the reserve polygon are not considered contained (see image at the end of post), despite being inside the polygon (the forest polygons originate from a union). Only forest area that is completely within (not bordering) the reserve is recognized. I ensured that this is not due to imprecisions caused by different CRS or projections.

The QGIS doc on contain is unspecific regarding points that are directly ON the polygon border (touch). It only specifies behavior for interior or exterior points, but not borderline.

Contain: Returns 1 (true) if and only if no points of b lie in the exterior of a, and at least one point of the interior of b lies in the interior of a. In the picture, no circle is returned, but the rectangle would be if you would look for it the other way around, as it contains circle 1 completely. This is the opposite of are within.

The illustrations within this question on spatial relations in QGIS suggests, borderline is "contained", and I'd prefer this behavior, but my results suggest otherwise: The polygons that I created using Union should have exactly the same border as their "parent polygons" (the polygons used to cut them) OR should be in their interior. But they don't show up as contained.

I tested other spatial relations, but they do cause other problems:

  • intersect, gives me forest areas of neighboring reserves that share the border. Meaning: intersect handles borderline as TRUE (as opposed to contain). Also, if I couldn't be sure (because of previos Union) that no polygons partially overlap, this would also include partially overlapping polygons.

  • overlap gives me correct results for polygons that are contained and touching but leaves out areas (forest) that are completely overlapping (equal) to the parent polygon (reserve). This behavior is described in the docs ("... but are not completely contained by each other").

  • all others (touch, cross, equal, disjoint) are not applicable to my case

  • according to docs, are within is the opposite of "contained" (NOT exterior and NOT touching), so it might work, but I'd have to switch Base Layer (reserves) and Join Layer (forests) and hence couldn't caluclate the summarized area of forest in reserves anymore.

Maybe I'm not seeing something, so before I raise an issue to the QGIS developers, I'm wondering:

  • isn't it completely unintuitive that the results of a union are not "contained" in their parent polygons?
  • isn't this use case a perfect example of why "contain" should be TRUE when points are interior OR borderline (the behaviour of contain should be changed?!) or there should be distinct relations (contains with/without shared points)?
  • ... shouldn't at least the docs be updated to make this behavior transparent (FALSE when points are exterior OR borderline/shared)?
  • there is no spatial join relation available, that does what I want?!

I checked in the ArcGIS Pro Doc without testing. They differentiate between three cases ("contained"/"completely contained"/"contains clementini"), where "contains clementini" omits results that are only borderline/on boundary - this suggests to me, that they apparently have the suggested behavior implemented (contained = [no exterior points] AND [at least one interior OR borderline/on boundary point]).


I can work around this by:

  1. defining a minor buffer
  2. spatial join (summary) with contain in the buffered layered
  3. joining the resulting summaries via ID back to the original unbuffered layer

Or:

  1. spatial join (summary) with overlap
  2. spatial join (summary) with equal
  3. write both results into one new field

But both seem circumstancial and dirty to me.


Example image of reserve (selected in red) containing a lot of forest (green) that is not recognized (Flaeche_ha_Wald_sum = NULL) because all forest polygons share borders with the reserve.

area of "contained" forest is NULL because of bordering polygons

Honeybear
  • 2,454
  • 1
  • 15
  • 25
  • 2
    Test data would help much more than a long description, even it is well written and logical. But you are right about the meaning of "contains". See also https://postgis.net/docs/manual-3.0/ST_Contains.html and https://postgis.net/docs/manual-3.0/ST_ContainsProperly.html. – user30184 Feb 01 '23 at 13:12
  • 1
    See the Clementini, et. al. paper referenced in https://gis.stackexchange.com/questions/446663/what-exactly-are-the-definitions-of-the-boundary-and-interior-of-a-line-or-a-poi/446669#446669 -- TLDR; Within/Contains can't touch – Vince Feb 01 '23 at 13:13
  • 1
    The union probably created a vertex that is considered outside of the original shape because of floating point accuracy. You can try joining on the polygon centroid instead. – JGH Feb 01 '23 at 13:14
  • @user30184 I added test data. Regarding the links you gave: Is the postgis engine underlying the (basic) processing tools of QGIS, so QGIS basically uses ST_contains here? I guess this part "An important subtlety of this definition is that A does not contain its boundary, but A does contain itself." is what should tell me that basically the boundary of A is already defined as "exterior" (hence causing points of B being there violating the first condition of the previous doc sentence). – Honeybear Feb 01 '23 at 14:05
  • Maybe you should edit the title into something like "Shouldn't QGIS "contain" consider polygons in the interior as contained even their boundaries touch?" – user30184 Feb 01 '23 at 14:05
  • QGIS is not using PostGIS as an engine but the rules are the same and PostGIS has good documentation. They both may use the same GEOS library under the hood but even that is not important. – user30184 Feb 01 '23 at 14:08
  • no no, the subtlety concerns only a polygon containing a line: it won't contain a line on its boundary because no point of the line falls inside the polygon. But for a polygon containing another polygon, it is fine to share a boundary since "at least one point" of the 1st polygon is inside the 2nd one. – JGH Feb 01 '23 at 14:14
  • I would like to have the same data that you have for the layers "waldreservate" and "waldreservate_union_wald". And what I plan to do with the data is to study if the vertices that you suppose to be at the same coordinates really have the same coordinates. See the comment by @JGH. – user30184 Feb 01 '23 at 14:21
  • @JGH so you're suggesting that the definition of contain is not the problem (neither in QGIS nor PostGIS), but floating point accuracy of vertices? How do I test that? Also: I'm still sceptical, as this is a consistent behavior on my dataset (~2300 polygons) and I'd assume with accuracy issues you'd at least get a few outliers where by chance all points happened to be inside (well I haven't checked all 2300 polygons, but a large portion). – Honeybear Feb 01 '23 at 14:21
  • @user30184 I added additional test data, see edited question above. "waldreservate" = testarea_reserve.gpkg and "waldreservate_union_wald" = testarea_reserve_union_forest.gpkg - due to terms of use I can't give you the whole dataset, but only parts of it, but the given dataset contains enough reproducible examples. – Honeybear Feb 01 '23 at 14:34
  • Maybe the issue is only in the "join attributes" tool. I created the joined layer that indeed have null values. But the other tool "Select by location" selects all features by inverse operation "select features from 'joined layer' where features are within 'testarea_reserve'". – user30184 Feb 01 '23 at 14:49
  • yes, definition is fine. To test, pick a unioned polygon, extract its points, then select points that don't intersect with the expected reserve polygon – JGH Feb 01 '23 at 14:54
  • I think the definition of touches and contains comes from the simple feature spec and that a polygon with a touching hole is one of the examples as it is only valid if they touch at a point and not a line. I think you might need contains_properly or some such for this case – Ian Turton Feb 01 '23 at 15:20
  • 1
    I may be missing something but if you just ran the union overlay (or SAGA Identity to exclude forested areas outside the reserves although that is easily done in the pivot or a selection) and recalculated the areas on the new layer, wouldn't a pivot table give you the information you want thereby skipping the join? This assumes the forest and nature areas are attributed as such, with unique values for each that you are interested in. That is how we did this kind of work for years. The Overlap Analysis tool might also work for you but I haven't used it. – John Feb 01 '23 at 21:23
  • @John: very good point :) a pivot table would give the information, but I wanted to ship it with the geometries of the reserves, so I would have to join it back to that layer one way or the other. but the overlap analysis tool is perfect... it does exactly what I want and even does it fast :D – Honeybear Feb 02 '23 at 07:52
  • Thanks to everyone for their inputs. I tried to condense insights into an answer. Feel free to post your own, I'll gladly upvote and accept to honor your efforts. – Honeybear Feb 02 '23 at 09:14

1 Answers1

2

To summarizing contents of the comments and to give this - admittedly rather lengthy question - an attempt of (an) answer:

A) Is this due to the definition of "contains"?

No, in this case, as JGH pointed out in the comments, the issue are the results of the union. They contain vertices which are outside of the "parent" polygons, as can be seen below (unselected green dots/rows). True to the definition of "contains", this returns false, since "at least one point is on the exterior of the polygon".

This (union result vertices being outside their cutting parent polygons) didn't happen, whenever the union resulted in a fully contained parent polygon (no partially overlapping areas).

example showing 9 out of 779 vertices being outside of the polygon

B) If yes, isn't this definition of "contains" flawed or should at least be made transparent in the docs?

No, most likely there is no problem with the "contains" definition - although in the comments an inconsistency of the "spatial join" behavior vs. the "select by location" behavior is reported. Afaik, it is unclear whether there are problems (besides union producing vertices outside of their parent polygons).

That said, in my opinion, yes, the docs should still be explicit about touching points. At this point, I'm assuming touching points are considered contained, since "equal" polygons (where ALL vertices are touching) are returned as well. Touching vertices are NOT returned by the relation "are within", which is according to doc "the opposite" of contained.

touching vertices vertices within

C) Are there alternatives (in QGIS) to join on polygons that are interior but have touching borders?

If the polygons are indeed only interior or touching, they should be joined by all available means. The problem here were exterior points introduced through the union.

An elegant alternative to the original goal of calculating overlap is the Processing tool overlap analysis, which does exactly what you want, bypassing (or implicitly executing) the steps of unioning and joining.

Honeybear
  • 2,454
  • 1
  • 15
  • 25