3

I have two shapefiles - test7 and MD_Overlay

test7 has about 600,000 records with over one hundred attributes (about ~2GB) , so performing spatial joins in QGIS is very inefficient. I would like to find out the equivalent SQL command of QGIS Join attributes by location in PostGIS. My aim is to add the columns of MD_Overlay to test7 and to keep both matching and non-matching records.

This would be a polygons in polygons operation. (So a polygon in test7 would have to be fully within a polygon in MD_Overlay in order to be joined)

I have done much research but cannot find a solution. Any advice?

enter image description here

nmtoken
  • 13,355
  • 5
  • 38
  • 87
iskandarblue
  • 2,092
  • 2
  • 17
  • 34

1 Answers1

1

There are a few ways to do this, here's one:

SELECT t.*, m.*
FROM test7 AS t
LEFT JOIN MD_Overlay AS m
ON ST_Within(t.geom, m.geom)

LEFT JOIN: Return all rows from the left table, and the matched rows from the right table. The m.* columns will be null when there is no spatial intersection. You should ensure that both your geometry columns have spatial indexes.

alphabetasoup
  • 8,718
  • 4
  • 38
  • 78
  • Make sure you have GIST indexes on t.geom – Evan Carroll Nov 30 '16 at 08:59
  • no join occurred . I get ERROR: GEOSContains: TopologyException: side location conflict at 2430764.1748952293 7027762.7618717942 is there any way to ignore errors and continue? – iskandarblue Nov 30 '16 at 09:53
  • Try adding a condition: WHERE ST_IsValid(t.geom) AND ST_IsValid(m.geom). Note that this will exclude records with invalid geometry from the result entirely. http://postgis.net/docs/ST_IsValid.html – alphabetasoup Nov 30 '16 at 10:13
  • It is possible that you input geometries are not all valid? Have a look at ST_IsValid (to check) and ST_MakeValid to fix, and try again. There is no way to avoid the GEOSContains error otherwise. OK, Richard beat me to it. I still think it might be worth running ST_MakeValid rather than excluding invalid geometries. – John Powell Nov 30 '16 at 10:13
  • I do, too, but OP asked about just ignoring them. With a name like test7 I'm just assuming the result isn't too important... otherwise it's important to invest a little bit of time to do things correctly. – alphabetasoup Nov 30 '16 at 10:31
  • When I run SELECT t.*, m.* FROM test7 AS t LEFT JOIN MD_Overlay AS m ON ST_Within(t.geom, m.geom) WHERE ST_IsValid(t.geom) AND ST_IsValid(m.geom) PGADMIN freezes and my computer overheats. However, the exact same Join attributes by location is successful in QGIS. In QGIS, a new shapefile is created in the process. Perhaps I should be creating a new table in PostGIS? – iskandarblue Nov 30 '16 at 17:30
  • Yes you should create a new table, given how many records you have and their width. For testing though just put in a LIMIT 100 at the end of the query. – alphabetasoup Nov 30 '16 at 18:35
  • SELECT t.*, m.* FROM test7 AS t LEFT JOIN MD_Overlay AS m ON ST_Within(t.geom, m.geom) WHERE ST_IsValid(t.geom) AND ST_IsValid(m.geom) LIMIT 100 : Thank you. This operation worked successfully and 100 records were returned. However, I notice that there are now two geometry columns in the resulting table. How would one create a separate table with the results of all records and when I ready to export back out to shapefile, which geom should be used? – iskandarblue Nov 30 '16 at 21:04
  • when I try CREATE TABLE first_join AS SELECT t.*, m.* FROM test7 AS t LEFT JOIN MD_Overlay AS m ON ST_Within(t.geom, m.geom) WHERE ST_IsValid(t.geom) AND ST_IsValid(m.geom) , I receive the following error ERROR: column "gid" specified more than once. In fact, 3 columns have the same name in both tables. Is there a good workaround? – iskandarblue Nov 30 '16 at 21:52
  • You need to specify the columns you want to keep (or rename). At this point we're getting far from the original question, which has been addressed. But generally you can just be explicit about the columns you want top retain in the output with the syntax SELECT x.col1, x.col2, y.col1 etc. Using .* means every column. – alphabetasoup Nov 30 '16 at 21:59