2

I've got a vector layer Target. I want to periodically add features that arrive (with the same attributes) in new.csv, which was created in the field.

I'd like to append them to Target if and only if they're not already there. I don't want to duplicate existing features, and I want to keep any tweaks done in Target, i.e. I don't want to replace existing features from new.csv either. (The features have a unique and consistent ID in a Name field.)

What's the simplest way to do this, in a way that can easily be (semi)automated to run weekly? I can think of the following, but seeking advice before I reinvent the wheel:

  1. Use the merge processing algorithm and then postprocess to delete duplicates, making new.csv have a flag in a each feature's comment field to know which duplicate to nuke.

  2. Dust off my rusty SQL and run some sort of APPEND query in DB Manager, with an EXCEPT clause to skip what's already in Target?

  3. Write a PyQGIS script to traverse the features in new.csv, throw out those where Name is already in Target, and then use layer.dataProvider().addFeatures(featureList) to add what's left to Target.

Suggestions? I don't think it matters, but the feature geometry are points in EPSG:4326. Target is one of several layers in a gpkg, and new.csv is of course a separate comma separated file, which can be imported as a layer with point geometry.

Houska
  • 7,976
  • 19
  • 48
  • 2
    right click 'save as' target? QGIS 2.81+ will append/update existing gpkg https://gis.stackexchange.com/questions/224008/add-layer-to-existing-geopackage-not-working-in-qgis – Mapperz Oct 23 '19 at 01:08
  • 1
    @Mapperz, thanks, but I'm looking to add the (missing) features to the existing layer, which happens to be in a gpkg, not to append a new layer to the gpkg database. Will edit for extra clarity. – Houska Oct 23 '19 at 02:08
  • 1
    Another method would be to add an action to the layer (layer properties > actions), so you can just click a button in the attribute table and the layer updates. You would have to figure out how to code that though, probably in python. https://docs.qgis.org/3.4/en/docs/pyqgis_developer_cookbook/vector.html#add-features – csk Oct 23 '19 at 19:27

2 Answers2

3

Circling back with what I've done, a pastiche of the other answers/comments here plus some of my own playing around.

I have a Master layer (in a gpkg) and a Raw.csv that is updated on return from fieldwork. (The shell command to update the .csv file is a layer action.)

I then use the processing algorithm Execute SQL with the query SELECT * FROM input2 WHERE Photo NOT IN (SELECT input1.Photo FROM input1 INNER JOIN input2 WHERE input1.Photo=input2.Photo) (Master is 1st input, Raw 2nd; note Photo is what I called Name in the question).

This is the first step in a graphical model that does some other corrections (e.g. a camera with a date function that is off) and finally outputs a temporary memory layer of the cleaned-up new features and attributes.

I then manually select-all, copy, paste to Master.

I've discovered the AppendFeaturesToLayer plug-in, which would let me bypass the manual copy-paste, but for the moment I appreciate the forced safety check of looking at the new feature attributes before they get committed to the master.

Houska
  • 7,976
  • 19
  • 48
  • 1
    I just have to chime in that the Append plugin should be added as a native processing tool. As long as we can create an empty layer with the correct table schema, writing iterative process results from the graphical modeler to a single layer with that plugin is a huge time-saver. – Gabriel Mar 08 '21 at 21:08
1

Here's one method. I think this is one of the easiest methods to set up, because you don't have to brush off / re-learn any coding skills. But it's not the most efficient to use, because you do have to do select the inputs every time.

You can set up a simple graphical processing model to do this.

  • Model inputs: 2 vector layers: original layer and layer with new features
  • Algorithm: Extract by attribute
    • input: layer with new features
    • parameters: field new (type the field name without quotation marks) equals 1
    • output: leave blank
  • Algorithm: Merge vector layers
    • inputs: original layer and 'Extracted (attribute)' from algorithm 'Extract by attributes'
    • output: updated layer

enter image description here

enter image description here

Downsides of this method:

  • You do have to select the two layers when you run the model.
  • You may have to import the CSV into the project before you can run this model. When running the model, if you click the ... button next to the dropdown to select the layer, you do have the option to select a layer that isn't loaded into the project, but that doesn't seem to work with CSV files.

Notes:

I assumed your CSV layer has a field called "new" and new features have "new" = 1. You can use a different operator and a calculated expression in the merge vector layer algorithm. For example, assume you have a "date created" field. Using the operator "greater than" and an expression like now() - to_interval( '1 week') will append all the features that were created in the past week.

You could even make one of the inputs for the model a text string where you enter a length of time (eg '1 week', '5 days', '3 months and 2 days') and use that in the to_interval() function in that example.

csk
  • 24,827
  • 3
  • 32
  • 70
  • Thanks @csk. I'll investigate processing models, an area of QGIS I haven't yet explored. I will need to go further on the selection of which features are "new" - basically my current kluge is something like your interval < 1 week suggestion but my reason for revisiting is that I collect data points some of which are actually older than one week. So "new" needs to be defined as "not in Target". But this is a useful other framework for me to explore! – Houska Oct 23 '19 at 22:32
  • This sent me down the right path. To get the new features dynamically, I used the algorithm Execute SQL with the query SELECT * FROM input2 WHERE Photo NOT IN (SELECT input1.Photo FROM input1 INNER JOIN input2 WHERE input1.Photo=input2.Photo) (Master is 1st input, the raw .csv layer including new 2nd) instead of Extract by attribute. And instead of merging (which generates a new layer) I do a copy-paste of the output back into Master, which I'll automate as a Python layer action later. – Houska Oct 29 '19 at 12:40
  • I'm glad my answer helped. Please post your method as another answer. That will be helpful to others with the same question. – csk Oct 29 '19 at 16:18