9

I am using QGIS to create a layer comprised of counties in 7 northeast US states.

I start with the US census 5M cartographic state borders file, and then set a filter on that layer:

"NAME" in (
  'New York',
  'Massachusetts',
  'Rhode Island',
  'Connecticut',
  'New Hampshire',
  'Vermont',
  'Maine'
)

That gives me a layer with seven features. I wanted to take the US census 5M cartographic county borders file and restrict that to the same geographic area. I thought I could do it by setting up a virtual layer and setting up a spatial join, like this:

select a.*
  from cb_2018_us_county_5m as a 
  join cb_2018_us_state_5m as b
  on (st_contains(b.geometry, a.geometry))

That works, but it takes 5 or 6 seconds to render the county outlines; I can watch as they show up one by one. Given the relatively small number of features involved in this operation, I was surprised by the performance. Am I doing something suboptimal? The same query in, say, PostGIS is a sub-second operation.

Vince
  • 20,017
  • 15
  • 45
  • 64
larsks
  • 1,256
  • 1
  • 11
  • 24
  • 1
    I would use the filter to narrow down the features, then use the virtual layer to build your data, then i would export those to a new layer right away. Every time you pan/zoom it is going to run that query, essentially acting as a database view – DPSSpatial_BoycottingGISSE Dec 08 '20 at 23:03
  • I thought perhaps I could just export the virtual layer as a new shapefile layer, but then I ran into this. – larsks Dec 08 '20 at 23:23
  • 3
    why not use the PostGIS layer instead of the shapefiles? – Ian Turton Dec 09 '20 at 08:36
  • 1
    @larsks - 1) When I followed your steps from your main description I had the same issue - slow rendering (and not only rendering - also opening Attribute Table was slow). This is definitely strange, but could you please specify a bit more how can we help you? There are many workarounds to your problem - for example using STATEFP column for counties which would filter only counties from states you need in non-spatial way. – Szym Dec 09 '20 at 09:06
  • 1
    @larsks - 2) I saw your bug description. I've tried to replicate in QGIS 3.14 - it seemingly worked. However, when I investigated for the Value "198956658395" in ALAND column in my data in "cb_2018_us_state_5m" - it turned out that it altered the values - some of the values are negative - which points to the integer range problem as well (but actually behaviour of QGIS 3.16 is better - at least it gives you obvious error message) – Szym Dec 09 '20 at 09:08
  • @larsks - looks like Virtual Layers' not very impressive speed is known issue: see Taras' answer here – Szym Dec 09 '20 at 10:12
  • @IanTurton in this case because I am putting together a project that I want to be able to share with someone else without having to include instructions for setting up postgis and importing data. Otherwise I agree that just importing everything into postgis and doing the processing there is a much better option. – larsks Dec 09 '20 at 14:42
  • https://docs.qgis.org/3.22/en/docs/user_manual/managing_data_source/create_layers.html#use-of-indexes – bugmenot123 Sep 13 '22 at 08:40

1 Answers1

6

For attributes join, I found that you can improve speed by first ordering the layers on the join attribute. Not sure if it would work on spatial joins too.

As an example, joining two delimited-text datasources (using the ogr driver) of 500 and 1000 entities respectively was taking roughly 10 seconds :

 SELECT
     *
 FROM 
     a
 JOIN
     b
 ON 
     a.foo = b.foo

Using ordered subqueries, the same join takes a few milleseconds :

 SELECT
     *
 FROM 
     (SELECT * FROM a ORDER BY foo) a
 JOIN
     (SELECT * FROM b ORDER BY foo) b
 ON 
     a.foo = b.foo

Since virtual layers are backed by Sqlite, I think that using the order by clause in the subqueries somehow gives hints to the Sqlite engine which can optimize the query plan and avoid scanning as many times as the number of lines in the datasources to perform the join (I'm not a Sqlite expert though).

tumasgiu
  • 361
  • 3
  • 8