We have a similar situation when dealing with students and summarizing the dominant ethnicity by blockgroup.
We use PostGIS, but you may be able to run the following in QGIS DB Manager SQL editor:
First we create a table of the counts of each ethnicity by blockgroup:
CREATE TABLE public.ethcounts AS (
SELECT
bg.geoid_bg
--count students by each race and assign corresponding column name
, count(CASE WHEN oct.ic_raceethnicity = '01'
THEN 1 END) AS Race_01
, count(CASE WHEN oct.ic_raceethnicity = '02'
THEN 1 END) AS Race_02
, count(CASE WHEN oct.ic_raceethnicity = '03'
THEN 1 END) AS Race_03
, count(CASE WHEN oct.ic_raceethnicity = '04'
THEN 1 END) AS Race_04
, count(CASE WHEN oct.ic_raceethnicity = '05'
THEN 1 END) AS Race_05
, count(CASE WHEN oct.ic_raceethnicity = '06'
THEN 1 END) AS Race_06
, count(CASE WHEN oct.ic_raceethnicity = '07'
THEN 1 END) AS Race_07
, count(*) AS enrollment
FROM dpsdata."OctoberCount_Archive" AS oct
JOIN dpsdata."Forecast_Zones" as bg on ST_Intersects(oct.geom, bg.geom)
WHERE oct.year = '2017'
GROUP BY bg.geoid_bg
Now to pick the dominant ethnicity for each blockgroup, I use some SQL trick I found on another forum (for PostgreSQL) to create a column called MAX that finds the maximum value across several columns of data - the race_01, race_02 columns:
-- finding the max across all race count columns
SELECT
(
SELECT max(RowVals) AS max
FROM
(VALUES
(race_01), (race_02), (race_03), (race_04), (race_05), (race_06), (race_07))
AS values(RowVals))
, ethcounts.*
FROM public.ethcounts
Lastly, I match the MAX value to the original ETHCOUNTS table, and wherever they match, assign the race from the original column as the dominant race for that blockgroup - putting that together (combined with the MAX count above) looks like this:
CREATE TABLE public.domethcounts_blockgroups AS (
SELECT
ethcounts.geoid_bg
, ethcounts.max
--assign race to max count
, CASE
WHEN ethcounts.max = ethcounts.race_01
THEN 'race_01'
WHEN ethcounts.max = ethcounts.race_02
THEN 'race_02'
WHEN ethcounts.max = ethcounts.race_03
THEN 'race_03'
WHEN ethcounts.max = ethcounts.race_04
THEN 'race_04'
WHEN ethcounts.max = ethcounts.race_05
THEN 'race_05'
WHEN ethcounts.max = ethcounts.race_06
THEN 'race_06'
WHEN ethcounts.max = ethcounts.race_07
THEN 'race_07'
END
AS DomRace
, bg.geom
-- finding the max across all race count columns
FROM (
SELECT
(
SELECT max(RowVals) AS max
FROM
(VALUES
(race_01), (race_02), (race_03), (race_04), (race_05), (race_06), (race_07))
AS values(RowVals))
, ethcounts.*
FROM public.ethcounts
) AS ethcounts
--join blockgroup geometry back to counts
JOIN dpsdata."Forecast_Zones" AS bg
ON ethcounts.geoid_bg = bg.geoid_bg
)
It does work really well, with the caveat that we again are using PostgreSQL/PostGIS and there is a chance that there is the exact same count across more than 1 column, which would cause an issue (haven't run into it yet).
I also haven't found a better way to do this, though I know some folks who have used some Python to do something similar with Census data.
This algorithm cannot be run :-( It seems that GRASS GIS 7 is not correctly installed and configured in your system. Please install it before running GRASS GIS 7 algorithms.
So I downloaded grass 7.x and it runs ok, so I then tried the tool again but the error persists?
– ohh_danielson Jan 09 '18 at 11:08