8

We are geocoding an immense directory of records. It is something in excess of 100 million addresses. I have split the addresses up into as small a geographic reason as I feel is feasible, states. Even so, a single state can have in excess of 5 million records to geocode. The arcpy script I composed will loop through each state's addresses and run the proper geocoding process with the appropriate locator.

Sometimes it takes 36 hours, sometimes it takes only 20, but the script will stop with an error I have never seen before (this is from the XML log file):
ERROR 001143: Background server threw an excecption.

  • If the script is run in ArcCatalog or Arcmap's Python window, it will show a red error message Runtime error <class 'arcgisscripting.ExecuteError'>: ERROR 001143: Background server threw an excecption.
  • If the script is run in IDLE, it will simply cease processing without an error and restart the shell (with the typical =====RESTART===== heading).

I know this arcpy script works with smaller datasets, as I have been using it for months now. What might be the cause of this error?

Do I need to split up my address listings into smaller amounts for them to geocode reliably?


I tracked down the Event Viewer Logs per @D.E. Wright and this is what I found listed under the most recent failure:

Faulting application name: pythonw.exe, version: 0.0.0.0, time stamp: 0x4ba3e4e2
Faulting module name: Geocoding.dll, version: 10.0.1.2800, time stamp: 0x4cbcbb71
Exception code: 0xc0000005
Fault offset: 0x000be1f3
Faulting process id: 0x%9
Faulting application start time: 0x%10
Faulting application path: %11
Faulting module path: %12
Report Id: %13

It is not exactly illuminating.

PolyGeo
  • 65,136
  • 29
  • 109
  • 338
Nathanus
  • 4,318
  • 1
  • 32
  • 47
  • Are you running this with a Server services that you are geocoding against? It sounds like it; and if so you may be seeing Web Services timeout or transaction count that is recycling the services breaking your process. – D.E.Wright Jan 10 '12 at 22:41
  • All the data is local, I'm afraid, which is what confuses me so much. – Nathanus Jan 10 '12 at 22:43
  • Are you using ArcGIS 10? There is a updated process when the software runs, you can see it in your task-manager called ARCSOCM.exe which is essentially a ESRI Server process running locally on your machine to handle the background process. What you may be seeing is this process failing or blowing up on you; you might try checking your event viewer on the machine and seeing if there are any app errors being logged; that can sometimes give information. – D.E.Wright Jan 10 '12 at 22:59
  • Sorry, they are called ArcSOCP.exe and ArcSOMP.exe that are these local processes that run. – D.E.Wright Jan 10 '12 at 23:31
  • Sounds like you might need to indeed break them up into smaller chunks like you mentioned. What about parsing out the zip codes from the addresses and then looping through those as groups? – Chad Cooper Jan 11 '12 at 19:18
  • 1
    My locators are state-sized. I would probably do the preprocessing with a *nix split command to simply split the text files into 1-2 million record-sized chunks before turning them into tables. Although the extra step of making them tables may be a waste of time since you can geocode text files. – Nathanus Jan 11 '12 at 19:20
  • Have you looked at the Multi-Thread Geocoding sample from ESRI? I use that and will on a regular basis run 10mil records through it just for my state. But Chad does raise a good point, you can split your states down into logical zip-code blocks; then run those smaller blocks then aggregate back. I use Dual-QuadCore machinces and find the ESRI sample which we have since modified breaking up the data and running parrallel is great. – D.E.Wright Jan 11 '12 at 22:18
  • @D.E.Wright No, I've never seen that sample. We have Dual-Quads, and that sounds like a great idea. I'll see if I can find it. If you know where I could locate it, though, I'd appreciate a nod in the right direction. – Nathanus Jan 11 '12 at 23:23
  • Here is a white-paper from 09 that has some good topics too about getting your data setup to be more efficient - http://www.esri.com/library/whitepapers/pdfs/arcgis-server-in-practice.pdf - and here is the link to the download that should be more helpful - http://resources.arcgis.com/gallery/file/geocoding/details?entryID=A284F7D9-1422-2418-7F50-BA718224C412 - You should find both to help you a lot. – D.E.Wright Jan 12 '12 at 16:30

2 Answers2

1

Here is a white-paper from 09 that has some good topics too about getting your data setup to be more efficient - http://esri.com/library/whitepapers/pdfs/arcgis-server-in-practice.pdf - and here is the link to the download that should be more helpful - http://resources.arcgis.com/gallery/file/geocoding/… - You should find both to help you a lot.

You can also message me direct with more questions since we do a lot a full-state scale processes and are always looking to collaborate.

Devdatta Tengshe
  • 41,311
  • 35
  • 139
  • 263
D.E.Wright
  • 5,497
  • 23
  • 32
0

Perhaps this answer will help you. 100 million addresses shouldn't take much more than a day to process and geocode. Keep in mind that we are not only geocoding the addresses but standardizing them and verifying them as well. If geocoding were the only task, it would be even faster.

Jeffrey
  • 1,484
  • 12
  • 13