3

I have a regularly spaced model grid, 5000 by 5000 feet (517,300 cells), which I need to convert to a numpy array. I am investigation a couple of different methods, using arcpy, by which I can convert the grid to a numpy array using the values in a particular field. They are as follows:

  1. Polygon to raster conversion followed by a raster to numpy array conversion, and
  2. Iteration over the rows of the grid, pulling row/col indices from each row and assigning values to an empty array.

Neither of these methods seems to be particularly fast - is there a more direct way to use the regular geometry of my features to apply the values to an array more efficiently? I will need to run this process on multiple versions of my model to populate parameter arrays for each of the 9 layers with multiple parameters per layer. Any processing time I can gain will be helpful.

import os
import arcpy
import datetime
import numpy as np

arcpy.CheckOutExtension("Spatial")
arcpy.env.workspace = r'..\gis\geom.gdb'
arcpy.env.overwriteOutput = True

grid = r'..\gis\grid_5000ft_01.gdb\grid'

out_table = r'in_memory\{}'.format('TOP_sas')
arcpy.sa.ZonalStatisticsAsTable(grid, 'OID', 'geodas_fasft', out_table, 'DATA', 'ALL')

grid_lyr = r'in_memory\grid_lyr'
table_vwe = r'in_memory\table_vwe'
arcpy.MakeFeatureLayer_management(grid, grid_lyr)
arcpy.MakeTableView_management(out_table, table_vwe)
arcpy.AddJoin_management(grid_lyr, 'OID', table_vwe, 'OID_', 'KEEP_ALL')

print [field.name for field in arcpy.ListFields(grid_lyr)]

start = datetime.datetime.now()
arcpy.PolygonToRaster_conversion(grid_lyr, '{}.MEAN'.format('TOP_sas'), r'in_memory\ras', '#', '#', 5000.)
a = arcpy.RasterToNumPyArray(r'in_memory\ras')
print (datetime.datetime.now() - start) / 60.

b = np.zeros((739, 700), dtype=np.float)
start = datetime.datetime.now()
with arcpy.da.SearchCursor(grid_lyr, ['grid.row', 'grid.col', '{}.MEAN'.format('TOP_sas')]) as cur:
    for row in cur:
        (r, c, val) = row
        a[r-1, c-1] = val
print (datetime.datetime.now() - start) / 60.

a.savetxt('{}.ref'.format('TOP_sas_a'))
b.savetxt('{}.ref'.format('TOP_sas_b'))
Jason Bellino
  • 4,321
  • 26
  • 35
  • Convert feature to raster much faster than polygon to raster. If this is about saving raster as text you might consider export it to ascii. Reading one row at a time with query in da.searchcursor. Table to numpy... – FelixIP Dec 11 '14 at 01:04
  • Forgot to mention non-GDB raster i.e. grid seems faster to create at least on my machine – FelixIP Dec 11 '14 at 01:08
  • @FelixIP, I like your idea of using the TableToNumpyArray conversion tool. I can probably find a way to vectorize the problem with numpy instead of using a search cursor to iterate over rows. – Jason Bellino Dec 11 '14 at 13:27
  • 1
    I think you should edit your update out of your question and into an answer that you will also be able to self-accept. Self-answering questions is perfectly acceptable. – PolyGeo Dec 15 '14 at 23:27

1 Answers1

2

Update (5/14/2015)

In a related question, I posted code for a function that computes zonal statistics for a raster given a regular polygon grid and converts that to a numpy array. Here is the relevant bit of code related to the grid-array conversion:

import numpy as np
import arcpy

def grid_to_array(grid, nrow, ncol, row_col_field_names, val_field_name):
    """
    Returns an array of shape (nrow, ncol) from a polygon grid of the same shape.
    Grid must have row/col values specified.
    vars:
         grid: name of grid feature class/layer; str
         nrow: number of rows; int
         ncol: number of columns; int
         row_col_field_names: names of fields containing row/col values for each cell; list, tuple
         val_field_name: name of field containing values of interest; same as row_col_field_names
    """

    # Convert regular-grid polygon shapefile to an array.
    a = arcpy.da.TableToNumPyArray(grid, row_col_field_names + val_field_name, skip_nulls=False)

    # Convert to recarray
    a_1 = np.rec.fromrecords(a.tolist(), names=['row', 'col', 'val'])
    a_1.sort(order=['row', 'col'])
    b = np.reshape(a_1.val, (nrow, ncol))
    return b

Original Answer

The grid I was using was stored as a feature class within a geodatabase; printing cursor iterations over the table revealed that it was only reading about 1,000 records at a time at which point it would pause for quite a while before reading another 1,000 records. After 45 minutes, the table still had not been traversed. By exporting to a shapefile I was able to much more quickly process the grid features (about 20 seconds). Final code:

for surface in arcpy.ListDatasets('*', 'Raster'):

    out_table = r'in_memory\{}'.format(surface)
    arcpy.sa.ZonalStatisticsAsTable(grid, 'FID', surface, out_table, 'DATA', 'MIN_MAX_MEAN')

    grid_lyr = r'in_memory\grid_lyr'
    table_vwe = r'in_memory\table_vwe'
    arcpy.MakeFeatureLayer_management(grid, grid_lyr)
    arcpy.MakeTableView_management(out_table, table_vwe)
    arcpy.AddJoin_management(grid_lyr, 'FID', table_vwe, 'FID', 'KEEP_ALL')

    a = arcpy.da.TableToNumPyArray(grid_lyr, ['grid.row', 'grid.col', '{}.MEAN'.format(surface)], skip_nulls=True)

    # OLD CODE - much less efficient
    #b = np.zeros((nrow, ncol), dtype=np.float)*np.nan
    #for idx, (row, col, val) in np.ndenumerate(a):
    #    b[row-1, col-1] = val

    # NEW CODE - much more efficient (updated 5/13/2015)
    a_1 = np.rec.fromrecords(a.tolist(), names=['row', 'col', 'val'])
    a_1.sort(order=['row', 'col'])
    b = np.reshape(a_1.val, (nrow, ncol))

    np.savetxt(r'ref\{}.dat'.format(surface), b, fmt='%5.0f')
Jason Bellino
  • 4,321
  • 26
  • 35