I have seen some difficult ways to do parallel processing, but I wonder if it is possible to simply execute multiple process of the same ArcPy script at the same time.
My script makes some changes to the default geodatabase, so I thought of making a geodatabase copy for each process.
I have updated the script to copy the shared resources between the processes, so it copies the geodatabase and the mxd's related to it.
I have made a test to parallelize, using this script:
pool = multiprocessing.Pool(2)
pool.map(test_func, [1, 2] , 1)
pool.close()
I noticed when I browse RAM and CPU use, every process consumes 200Mb. So, if I have 6 Gb of RAM, I think I have to exploit 5Gb RAM of it by enlarging the pool size to:
5000 / 200 = 25
So, to exploit the whole power of the machine, I think I should use 25 as pool size.
I need to know if this is the best manner, or how I could measure the efficiency of this parallelization.
This an example of the code that I'm trying to parallelize. The whole script contains 1500 lines of code almost like this one:
def dora_layer_goned():
arcpy.Select_analysis( "layer_goned" , "layer_goned22" )
arcpy.MakeFeatureLayer_management("layer_goned", "layer_goned_lyr")
arcpy.SelectLayerByLocation_management("layer_goned_lyr" ,"WITHIN", "current_parcel" , "" , "NEW_SELECTION")
arcpy.SelectLayerByAttribute_management("layer_goned_lyr" , "SWITCH_SELECTION" )
arcpy.Select_analysis("layer_goned_lyr" , "layer_goned_2_dora2" )
arcpy.Clip_analysis("layer_goned_lyr" , "current_parcel_5m_2" , "layer_goned_2_dora" )
arcpy.SelectLayerByAttribute_management("layer_goned_lyr" , "CLEAR_SELECTION" )
arcpy.FeatureToPoint_management("layer_goned_2_dora","layer_goned_2_dora_point","CENTROID")
arcpy.MakeFeatureLayer_management("layer_goned_2_dora2", "layer_goned_2_dora_lyr")
arcpy.SelectLayerByLocation_management("layer_goned_2_dora_lyr" ,"INTERSECT", "layer_goned_2_dora_point" , "" , "NEW_SELECTION")
arcpy.DeleteFeatures_management("layer_goned_2_dora_lyr")
arcpy.FeatureVerticesToPoints_management("current_parcel","current_parcel__point", "ALL")
arcpy.FeatureVerticesToPoints_management("carre_line","carre_line__point", "ALL")
arcpy.CalculateField_management("current_parcel__point","id","!objectid!","PYTHON_9.3")
arcpy.SpatialJoin_analysis("carre_line__point" , "current_parcel__point" , "carre_line__point_sj","JOIN_ONE_TO_ONE" , "KEEP_COMMON" , "" , "CLOSEST")
arcpy.Append_management("current_parcel__point" , "carre_line__point_sj" , "NO_TEST") #
arcpy.PointsToLine_management("carre_line__point_sj", "carre_line__point_sj_line", "id")
arcpy.Buffer_analysis("carre_line__point_sj_line" , "carre_line__point_sj_line_buf" , 0.2)
arcpy.Erase_analysis("layer_goned_2_dora2" , "carre_line__point_sj_line_buf" , "layer_goned_2_dora_erz")
arcpy.MultipartToSinglepart_management("layer_goned_2_dora_erz" , "layer_goned_2_dora_erz_mono")
arcpy.MakeFeatureLayer_management("layer_goned_2_dora_erz_mono", "layer_goned_2_dora_erz_lyr")
arcpy.SelectLayerByLocation_management("layer_goned_2_dora_erz_lyr" ,"SHARE_A_LINE_SEGMENT_WITH", "current_parcel" , "" , "NEW_SELECTION")
arcpy.SelectLayerByLocation_management("layer_goned_lyr" ,"CONTAINS", "layer_goned_2_dora_erz_lyr" , "" , "NEW_SELECTION")
arcpy.DeleteFeatures_management("layer_goned_lyr")
arcpy.Append_management("layer_goned_2_dora_erz_lyr" , "layer_goned" , "NO_TEST") #
arcpy.SelectLayerByAttribute_management("layer_goned_2_dora_erz_lyr" , "CLEAR_SELECTION" )
test_func) that you are trying to speed up. Also, your hypothesis of using 25 as the pool size is flawed because you seem to be under the false impression that using all of your system's memory will automatically help with performance. This is not necessarily true, and indeed, most likely would result in decreased performance as the pagefile starts to be hit (also keep in mind each 32-bit process is limited to 2GB by default). You should be seeking to minimize overall processing time, not maximizing memory usage. – blah238 Mar 21 '13 at 21:58