1

I am iterating through a list of shapefiles and extracting subsets of each shapefile to save as a KML using ogr2ogr. How can I extract the subset based on multiple conditions while using string formatter to define the in/out files and conditions? I am following the answer from Selecting features by attributes using ogr2ogr? in my attempt.

Using Python 3.7

This example with only one condition works without issue:

import os
import numpy as np
import pathlib

example dummy ids

id1 = np.linspace(start=1, stop=5, num=5, endpoint=True, dtype=int) id2 = np.linspace(start=301, stop=302, num=2, endpoint=True, dtype=int)

outpath = pathlib.Path("working").resolve() for ii in id1: command = ( ' ogr2ogr -f "KML" "{outfile}" "{infile}" ' ' -where "ID_1 = {my_id}" ' ' -dsco NameField = "{name}" ' ) fout = 'test' + str(ii) + '.kml' fout = outpath / fout os.system(command.format(outfile=fout, infile='points.shp', my_id=ii, name='county'))

This example with multiple conditions using a SQL query returns an error:

for ii in id1:
    for jj in id2:
            command = (
                ' ogr2ogr -f "KML" "{outfile}" "{infile}" '
                ' -sql "SELECT * from {infile2} WHERE ID_1 = {my_id} AND ID_2 = {other_id}" '
                ' -dsco NameField = "{name}" '
            )
            print(command)
            fout = 'test' + str(ii) + '_' + str(jj) + '.kml'
            fout = outpath / fout
            os.system(command.format(
                outfile=fout,
                infile='points.shp',
                infile2=pathlib.Path('points.shp').stem,
                my_id=ii,
                other_id=jj,
                name='county')
            )

print statement returns:

ogr2ogr -f "KML" "{outfile}" "{infile}"  -sql "SELECT * from {infile2} WHERE ID_1 = {my_id} AND ID_2 = {other_id}"  -dsco NameField = "{name}" 

And the error:

Warning 1: layer names ignored in combination with -sql.
ERROR 1: SQL Expression Parsing Error: syntax error, unexpected $undefined, expecting string or identifier. Occurred around :
SELECT * from points WHERE ID_1
              ^
a11
  • 940
  • 10
  • 22
  • Suggest you try gdal.VectorTranslate (pretty much identical to ogr2ogr) rather than an external process, but if you really want to use an external process, don't use os.system, use subprocess.* e.g subprocess.run() which handles all the commandline argument formatting for you, you just pass in a list of arguments. – user2856 Jul 20 '21 at 00:39
  • @user2856

    I also tried subprocess.run(command.format(outfile=fout, infile='points.shp', my_id=ii, name='county'), shell=True) for my first example (that worked fine with os) and received Warning 6: creation option 'NameField' is not formatted with the key=value format ERROR 1: Couldn't fetch requested layer '='! and it just creates an empty KML

    – a11 Jul 20 '21 at 01:16
  • @user2856 I tried gdal.VectorTranslate but ran into some errors; I thought it was worthwhile as its own question – a11 Jul 20 '21 at 01:49
  • Deleted my answer and copied to your other question :) With subprocess.run, you pass a list of arguemnts not a command string. – user2856 Jul 20 '21 at 01:59

1 Answers1

1

Here's how to use subprocess.run:

import subprocess
import numpy as np
import pathlib

example dummy ids

id1 = np.linspace(start=1, stop=1, num=1, endpoint=True, dtype=int) id2 = np.linspace(start=1, stop=1, num=1, endpoint=True, dtype=int)

infile = '/tmp/points.shp' inlayer = pathlib.Path(infile).stem outpath = pathlib.Path("/tmp").resolve() name_field = 'county'

for ii in id1: for jj in id2: fout = str(outpath / f'test{ii}_{jj}.kml')

    result = subprocess.run(
        [
            "ogr2ogr",
            "-f", "KML",
            "-where", f'ID_1 = {ii} AND ID_2 = {jj}',
            '-dsco', f'NameField={name_field}',
            fout, infile, inlayer
        ],
        capture_output=True,
        check=True)

    print(result.stdout, result.stderr)

Note:

  • I've changed your -sql argument clause to a simpler -where argument
  • I used f'string {variable}' aka "f strings" for a cleaner syntax
  • You pass subprocess.run a list of arguments, not a full command string and let it handle generating the correct (properly quoted etc) command
user2856
  • 65,736
  • 6
  • 115
  • 196
  • thanks, this is a really nice and clean explanation-- and it looks like it shuld work, but I receive TypeError: argument of type 'WindowsPath' is not iterable. The error pops on check=True; then if I remove that line, error is on capture_output=True; and so on, removing various parts of the string. It looks like this might be a bug, where you able to run it successfully with Path? – a11 Jul 20 '21 at 17:53
  • That code works me as-is and if I use pathlib.Paths, but I'm using Linux, so as you're using Windows make sure you convert pathlib Paths to strings when passing to subprocess. – user2856 Jul 20 '21 at 20:01