3

I've been working on a data that is in .tif raster format. I need to convert that into a points shapefile (.shp) to proceed. I went through GDAL in Python but there's only gdal.Polygonize(). So how do I convert it into a points .shp? I've used this code so far:

from osgeo import gdal,ogr
import sys
import os
gdal.UseExceptions()
os.chdir(path)

src_ds = gdal.Open("IND_ppp_2015_v2.tif")

if src_ds is None:
print ("Unable to open Worldpop data")
sys.exit(1)

try:
    srcband = src_ds.GetRasterBand(1)
drv = ogr.GetDriverByName("ESRI Shapefile")
dst_ds = drv.CreateDataSource(dst_layername + ".shp")
dst_layer = dst_ds.CreateLayer(dst_layername,srs = None)

gdal.Polygonize(srcband, None,dst_layer, -1,[],callback = None)

Can you suggest a way to convert into points .shp?

Vince
  • 20,017
  • 15
  • 45
  • 64
  • If GDAL/Python is not a requirement, the following link may help you. It's also an open source tool solution: https://gis.stackexchange.com/questions/171300/creating-point-shapefile-from-geotiff-using-gdal-ogr – Saleika Jan 18 '18 at 07:46
  • If you want to use GDAL you could use the gdal2ogr tool: https://trac.osgeo.org/gdal/browser/trunk/gdal/apps/gdal2ogr.c – Saleika Jan 18 '18 at 07:50
  • I'm trying to implement that in Python – Srinath Radhakrishnan Jan 18 '18 at 09:13
  • This could be an approach: 1. Create a numpy array from your tif file (see this answer: https://gis.stackexchange.com/a/265507). 2. Create a new shapefile with the coordinate pairs and the value (see https://gis.stackexchange.com/questions/30261/how-to-create-a-new-empty-vector-layer-programmatically). – Stefan Jan 18 '18 at 09:36
  • @Stefan I can't understand the second part tho. How do I convert the values into a shp file? – Srinath Radhakrishnan Jan 19 '18 at 06:40
  • 1
    In your comments you've wrote that your tif-file is about 1.5GB. It would be interesting: 1. Having more information about the raster itself (datatype, cellsize, etc.) , 2. What is your project about? ,3. is there the need to tile the "large" ("large" in the meaning of processing, not storing, visualizing ) raster-file and 4. why do you need a shapefile from it? – Stefan Jan 19 '18 at 07:50
  • 1
    Given the size of your data, it's not a good idea to convert it to a point shapefile... this is the question! I'm agree with @Stefan ... if you explain us what is your final goal, I'm sure that this task is not necessary at all. In such case, I suggest you to post another question. – Antonio Falciano Jan 19 '18 at 11:41

2 Answers2

6

I'd use GDAL and OGR utilities as library functions (if possible):

from osgeo import gdal
import os

filename='IND_ppp_2015_v2'
inDs = gdal.Open('{}.tif'.format(filename))
outDs = gdal.Translate('{}.xyz'.format(filename), inDs, format='XYZ', creationOptions=["ADD_HEADER_LINE=YES"])
outDs = None
try:
    os.remove('{}.csv'.format(filename))
except OSError:
    pass
os.rename('{}.xyz'.format(filename), '{}.csv'.format(filename))
os.system('ogr2ogr -f "ESRI Shapefile" -oo X_POSSIBLE_NAMES=X* -oo Y_POSSIBLE_NAMES=Y* -oo KEEP_GEOM_COLUMNS=NO {0}.shp {0}.csv'.format(filename))

First, I'd convert the GeoTIFF into the XYZ format (very close to CSV). Then I'd rename the .xyz file to .csv and finally convert the CSV file to ESRI Shapefile.

Antonio Falciano
  • 14,333
  • 2
  • 36
  • 66
  • I tried this code. My raster size is 1.5GB and the xyz format goes beyond 15GB in my local disk and I had to terminate the process – Srinath Radhakrishnan Jan 19 '18 at 05:58
  • Ok, but why do you need to convert your data to a point shapefile? Maybe you'd like to vectorize only a tiny part of it. – Antonio Falciano Jan 19 '18 at 11:45
  • no. the file contains population data of India and I need to visualize it as points for further analysis. the method looks easy with arcpy but arcpy comes at a cost of installing arcgis (not free) – Srinath Radhakrishnan Jan 19 '18 at 13:16
5

A very good starting point to implement a raster to point script using Python and GDAL is the "Raster to vector line" recipe: https://pcjericks.github.io/py-gdalogr-cookbook/raster_layers.html#raster-to-vector-line

You just have to modify (simplify) the array2shp method. I guess it's easier than to implement the gdal2org tool in Python.

EDIT:

I modified the "Raster to vector line" recipe to get a raster to point script. I tested the script with a 5000x5000 16bit gray scale Image and it's very slow. I would recommend the script only for small rasters.

import ogr, gdal, osr, os
import numpy as np
import itertools

def pixelOffset2coord(raster, xOffset,yOffset):
    geotransform = raster.GetGeoTransform()
    originX = geotransform[0]
    originY = geotransform[3]
    pixelWidth = geotransform[1]
    pixelHeight = geotransform[5]
    coordX = originX+pixelWidth*xOffset
    coordY = originY+pixelHeight*yOffset
    return coordX, coordY

def raster2array(rasterfn):
    raster = gdal.Open(rasterfn)
    band = raster.GetRasterBand(1)
    array = band.ReadAsArray()
    return array

def array2shp(array,outSHPfn,rasterfn):

    # max distance between points
    raster = gdal.Open(rasterfn)
    geotransform = raster.GetGeoTransform()
    pixelWidth = geotransform[1]

    # wkbPoint
    shpDriver = ogr.GetDriverByName("ESRI Shapefile")
    if os.path.exists(outSHPfn):
        shpDriver.DeleteDataSource(outSHPfn)
    outDataSource = shpDriver.CreateDataSource(outSHPfn)
    outLayer = outDataSource.CreateLayer(outSHPfn, geom_type=ogr.wkbPoint )
    featureDefn = outLayer.GetLayerDefn()
    outLayer.CreateField(ogr.FieldDefn("VALUE", ogr.OFTInteger))

    # array2dict
    point = ogr.Geometry(ogr.wkbPoint)
    row_count = array.shape[0]
    for ridx, row in enumerate(array):
        if ridx % 100 == 0:
            print "{0} of {1} rows processed".format(ridx, row_count)
        for cidx, value in enumerate(row):
            Xcoord, Ycoord = pixelOffset2coord(raster,cidx,ridx)
            point.AddPoint(Xcoord, Ycoord)
            outFeature = ogr.Feature(featureDefn)
            outFeature.SetGeometry(point)
            outFeature.SetField("VALUE", int(value))
            outLayer.CreateFeature(outFeature)
            outFeature.Destroy()
    outDS.Destroy()

def main(rasterfn,outSHPfn):
    array = raster2array(rasterfn)
    array2shp(array,outSHPfn,rasterfn)

if __name__ == "__main__":
    rasterfn = r'D:\test.tif'
    outSHPfn = r'D:\test.shp'
    main(rasterfn,outSHPfn)
Saleika
  • 1,320
  • 12
  • 18