Extract specific information from a shapefile

Question

I work with water and waste water systems. I have a shapefile that involves all the manholes and pipe ID's. My task is to extract 72 pipes that have unique ID's.

In a shapefile that involves 4000+ pipes, how can I find these 72 pipes that I need, select and extract a shapefile?

You need to give more information. Which software do you use? What have you tried so far? — Nil, Feb 20 '24 at 09:06
Hei. Thank you. I am using QGIS. I haven't tried much as I am quite new in the field and don't know if that is possible. — Kaltrina Loshi, Feb 20 '24 at 09:20
Do you know the unique ID's or you need to find out that as well? — Nil, Feb 20 '24 at 09:49
You can try the 9. Step in this tutorial https://www.qgistutorials.com/en/docs/working_with_attributes.html If it is not working, you can post the expression which you used. Then it would be easier to help further — Nil, Feb 20 '24 at 10:21

score 3 · Answer 1 · answered Feb 20 '24 at 19:23

Your question is slightly vague, but I assume you mean that you have a list of pipe IDs and want to extract those pipes from a large dataset of pipes with IDs. If so:

Joining method

Put the list of your IDs into a CSV file
Go to the toolbar and click Layer > Add Layer > Add Delimited Text Layer
In the menu that appears, select your file and specify in the Geometry Definition section that it has No geometry. Then press Add.
Double-click on your vector layer with the pipes and scroll on the left-hand tabs until you find Joins.
Click the + icon at the bottom and as the Join layer select the CSV file you have just added. As the join field, select the attribute on the CSV dataset that contains the pipe ID, and for the target field, select the attribute that contains the pipe ID on the vector dataset.
Your pipes layer should now contain an attribute on all of the pipes from the original CSV (in this case it would just be a newly populated ID field, but you could add other fields in the CSV that would be joined).
In the toolbar, choose the Select Features by Expression tool.
Take the attribute name from the attribute table (it will be something along the lines of <CSVFILENAME>_<ATTRIBUTENAME>, and it will autocomplete). In the expression dialogue that has appeared, type a double quote (") and begin typing the name of the attribute. It should autocomplete as relevant.
For this, the expression of "<ATTRIBUTENAME>" IS NOT NULL should select all of the vector objects that have an attribute that you have joined.
Then right click on the pipes layer, ensuring your objects are still selected, and go to Export > Save featues as. Choose a save location and ensure that Save only selected features is ticked.

Tom Brennan · Answer 2 · 2024-02-21T04:47:42.077

1

I assume that when you say you have "72 pipes that have unique ID's", the remaining 3000+ pipes share an ID with one or more of the other pipes?

I created a dummy layer with 15 features, 3 (highlighted) of which have unique IDs, and the remaining ones with shared IDs:

You can then run the following expression in Select by Expression:

with_variable('parent_id',"ID",
    array_length(string_to_array(
        aggregate(@layer,
                  aggregate:='concatenate',
                  expression:=to_string(@id),
                  filter:="ID"=@parent_id,
                  concatenator:=','),
    delimiter:=',')
) = 1)

Basically it compares each feature to all the other features to find out how many features have the same ID.

This will select only the unique features (ID 2, 3 and 6). If you wanted the non-unique features, you can change the '= 1' to '> 1' in the expression.

Edit: a quicker, cleaner expression that achieves the same outcome, as suggested by @she_weeds, is to use:

count(1,"ID") = 1

Finally, right click on the layer, and choose Export->Save Selected Features As... and choose a new filename to export to.

edited Feb 21 '24 at 04:47

answered Feb 20 '24 at 21:50

Tom Brennan

4,787
6
26

1

You could simplify that expression using 'array_agg' as the aggregate() parameter to directly generate an array - or, shameless self-plug, do the inverse of this method for identifying duplicates and use count(1,"ID") = 1 instead – she_weeds Feb 21 '24 at 02:31
I like both of those suggestions - the first because it's an improvement on the generic iteration approach, which is useful for a wide range of problems. But the second is a much cleaner solution to this particular question. I'm happy to update my answer but would be more then happy to vote up yours for the second one! – Tom Brennan Feb 21 '24 at 03:00
Feel free to update your answer either way - OP has given so few details it's a bit hard to tell exactly what they want to accomplish anyway... – she_weeds Feb 21 '24 at 03:34

Extract specific information from a shapefile

2 Answers2

Joining method