Populating polygon with most frequent point attribute using QGIS?

Question

What I have is two features, a polygon showing wards, and a point shapefile showing properties which has an attribute detailing the type of consumer within the household, such as country living, urban cohesion, family basics etc.

The point data has a lot of features, as you'd expect, and what I am trying to do is workout what type of consumers live within a particular ward. Obviously within a single ward there will be a variety of consumers, so I am after finding the most common consumer. So ward x may be classified as mainly country living consumers, ward y family basics etc. The end result will likely be a new field in the ward shapefile with attribute for each ward with the most frequent consumer type.

I have tried a few things, in both ArcGIS and QGIS, such as spatial joins but I am not having much luck.

score 3 · Answer 1 · answered Dec 07 '17 at 10:59

As stated before, you ARE on the right track with spatial join, and you NEED to create numerical IDs for all your property types. Here is a solution in QGIS, I used 2.18.14 LTR. A very fine thing in QGIS is that you can make use of nearly any others open source GIS tools, so I do here with GRASS 7.

First I created two sample data sets, polygons with just ids (1,2,3) and some points with an attribute prop_name containing the property types you referred to, styled and labelled it looks like this:

Now create an attribute prop_id with a an id for each property, using field calculator in QGIS with create new integer field and some statement like

case
when "prop_name" = 'country living' then 1
when "prop_name" = 'urban cohesion' then 2
when "prop_name" = 'family basics' then 3
end

Your attribute table should now look like this:

Now you're prepared to do your analysis. In menu 'processing' turn on 'processing toolbox' (at its bottom switch to 'advanced interface') and search for a GRASS tool v.vect.stats. This tool is able to do a variety of univariate statistic analysis on points per polygon (you may look for other tools here that do the trick also, i.e. SAGA Point statistics for polygons).

Set it up like shown below, note that you have to specify two column names containing your results.

The statistical parameter you want is called mode here, giving you the most frequent attribute value in a polygon:

The resulting attribute table will look like this (result is float number, not an integer, but this does not matter here):

And the visualized and styled result (layer name is 'Analysis results'):

@ohh_danielson So consider to accept this answer, if it solves your problem ;-) — Jochen Schwarze, Dec 11 '17 at 10:31
I have only just got around to trying this, as it looks exactly what I am after, however I get this message when trying to use v.vect.stats:
This algorithm cannot be run :-( It seems that GRASS GIS 7 is not correctly installed and configured in your system. Please install it before running GRASS GIS 7 algorithms.

So I downloaded grass 7.x and it runs ok, so I then tried the tool again but the error persists? — ohh_danielson, Jan 09 '18 at 11:08

Zipper1365 · Answer 2 · 2017-12-06T16:45:48.440

2

I think you're on the right track with the spatial join.

Your task is similar to one I do often in ArcGIS where I summarize the diameter of pipes in sewer networks.

There always several methods to get to the results; this is just one and it stops just shy of actually getting you what you need, but for small samples you can do the last step of determining the most frequent by hand:

Do the spatial join or intersect so that all of your points have the containing "ward polygon ID" appended to the point attribute table.

Run a frequency of the resulting spatial join table on the "Ward ID" and the "Customer type" fields (by selecting two frequency fields, it looks for all combinations of both of those attributes and will produce a table with a count of those combinations.)

that is:

X records are "Rural, Ward 10"

X records are "Rural, Ward 11"

X records are "city, Ward 10"

X records are "city, Ward 11"

For small numbers of combinations, this has worked for me as I can identify the most frequent by eye, but for larger numbers of categories, this won't take you far enough.

edited Dec 06 '17 at 16:45

answered Dec 06 '17 at 15:24

Zipper1365

942
5
17

Thank you. This sounds like it could work, it is a bit more of a manual method, as in needing to look in each one and identify the most frequent, but it will work. I will need to look at the frequency tool, as I don't think I have used this before. Thanks again. – ohh_danielson Dec 06 '17 at 16:17
If you do happen to find a more automated or more complete method, please do share as it would be helpful to me, too! – Zipper1365 Dec 06 '17 at 16:20
I am looking at using frequency within python, some of the code refers to the frequency fields, which I know is the ward and the consumer type, but there is also a summaryfield section... do you know what this refers to (or if I need it)? Thanks – ohh_danielson Dec 06 '17 at 16:49
The summary field is optional. It will add up the values in that field that fit in a given frequency group. An example I use it for is tallying sewer pipes in different tributary areas. I want to know now both the number of idividual 6" pipes as well as the the total linear footage of 6" pipes, so I specify pipe "diameter" and "tributary area" as the frequency fields and then "pipe length" as the summary field. – Zipper1365 Dec 12 '17 at 02:29

score 2 · Answer 3 · answered Dec 06 '17 at 15:59

We have a similar situation when dealing with students and summarizing the dominant ethnicity by blockgroup.

We use PostGIS, but you may be able to run the following in QGIS DB Manager SQL editor:

First we create a table of the counts of each ethnicity by blockgroup:

CREATE TABLE public.ethcounts AS (
  SELECT
    bg.geoid_bg
    --count students by each race and assign corresponding column name
    , count(CASE WHEN oct.ic_raceethnicity = '01'
    THEN 1 END) AS Race_01
    , count(CASE WHEN oct.ic_raceethnicity = '02'
    THEN 1 END) AS Race_02
    , count(CASE WHEN oct.ic_raceethnicity = '03'
    THEN 1 END) AS Race_03
    , count(CASE WHEN oct.ic_raceethnicity = '04'
    THEN 1 END) AS Race_04
    , count(CASE WHEN oct.ic_raceethnicity = '05'
    THEN 1 END) AS Race_05
    , count(CASE WHEN oct.ic_raceethnicity = '06'
    THEN 1 END) AS Race_06
    , count(CASE WHEN oct.ic_raceethnicity = '07'
    THEN 1 END) AS Race_07
    , count(*)  AS enrollment
  FROM dpsdata."OctoberCount_Archive" AS oct
    JOIN dpsdata."Forecast_Zones" as bg on ST_Intersects(oct.geom, bg.geom)

  WHERE oct.year = '2017'

  GROUP BY bg.geoid_bg

Now to pick the dominant ethnicity for each blockgroup, I use some SQL trick I found on another forum (for PostgreSQL) to create a column called MAX that finds the maximum value across several columns of data - the race_01, race_02 columns:

  -- finding the max across all race count columns

         SELECT
           (
             SELECT max(RowVals) AS max
             FROM
               (VALUES
                 (race_01), (race_02), (race_03), (race_04), (race_05), (race_06), (race_07))
                 AS values(RowVals))
           , ethcounts.*
         FROM public.ethcounts

Lastly, I match the MAX value to the original ETHCOUNTS table, and wherever they match, assign the race from the original column as the dominant race for that blockgroup - putting that together (combined with the MAX count above) looks like this:

CREATE TABLE public.domethcounts_blockgroups AS (
  SELECT
    ethcounts.geoid_bg
    , ethcounts.max
    --assign race to max count
    , CASE
      WHEN ethcounts.max = ethcounts.race_01
        THEN 'race_01'
      WHEN ethcounts.max = ethcounts.race_02
        THEN 'race_02'
      WHEN ethcounts.max = ethcounts.race_03
        THEN 'race_03'
      WHEN ethcounts.max = ethcounts.race_04
        THEN 'race_04'
      WHEN ethcounts.max = ethcounts.race_05
        THEN 'race_05'
      WHEN ethcounts.max = ethcounts.race_06
        THEN 'race_06'
      WHEN ethcounts.max = ethcounts.race_07
        THEN 'race_07'
      END
    AS DomRace
    , bg.geom

  -- finding the max across all race count columns
  FROM (
         SELECT
           (
             SELECT max(RowVals) AS max
             FROM
               (VALUES
                 (race_01), (race_02), (race_03), (race_04), (race_05), (race_06), (race_07))
                 AS values(RowVals))
           , ethcounts.*
         FROM public.ethcounts
       ) AS ethcounts

    --join blockgroup geometry back to counts
    JOIN dpsdata."Forecast_Zones" AS bg
      ON ethcounts.geoid_bg = bg.geoid_bg

 )

It does work really well, with the caveat that we again are using PostgreSQL/PostGIS and there is a chance that there is the exact same count across more than 1 column, which would cause an issue (haven't run into it yet).

I also haven't found a better way to do this, though I know some folks who have used some Python to do something similar with Census data.

Thank you, this sounds like it would work for me but I am not very experienced with using SQL and so on, so I may have to try another method before I attempt this! Thanks again, Dan. — ohh_danielson, Dec 06 '17 at 16:31

score 0 · Answer 4 · answered Dec 06 '17 at 14:53

0

Create some new fields to point feature. All of them are integer. One feature point has a value. Let's say number of cars. One point has zero, other has 5 etc. Let's say other attribute is number of kids. So consumer have zero to many kids.

So if you get integer type attributes (digits only) you might use Spatial Join tool to sum all points within polygons to get sum of all cars, kids etc. Spatial Join tool is in ArcGIS ToolBox. Analysis Tools -> Overlay -> Spatial Join. Click help is needed. But this tool is quite easy and handy. O usse it everyday. Let me know if I did understand the task right and if it works.

answered Dec 06 '17 at 14:53

Nežinomas Asmuo

95
7

Thanks for the reply but I am not sure this is doing what I require. I have about 80,000+ household points, each with an attribute ranging from A to O, detailing the type of consumer, so A = Country living for example. What I am after is identfying what consumer type is most present in a ward. So a ward may have 20 points which have the attribute 'A', 10 point which are B, 14 points which are C etc., so I would want that polygon to be shown as A (Country Living). – ohh_danielson Dec 06 '17 at 16:28
I'm not sure did I understand right. So you can not sum all, but get mean value. In this case you'd get average polygon value. Some kind of generalisation. All values lower/higher than mean is not so important. – Nežinomas Asmuo Dec 07 '17 at 05:54
I will try this today and let you know how I get on :) – ohh_danielson Dec 07 '17 at 10:20

score 0 · Answer 5 · edited Apr 06 '23 at 12:38

I believe you could solve this using the Spatial Join tool and some reclassification.

Reclassify the types into an integer field, with each different type receiving a unique integer.
Run a one to one spatial join, setting the merge rule to 'Mode' for your new integer field. This will return the most common type for the points near your feature.
Lastly, reclassify the integer back into the text field noting the original type.

Populating polygon with most frequent point attribute using QGIS?

5 Answers5

Linked