7

Given:

  • a geodatabase A that contains 100 datasets named a1 .. a100
  • a geodatabase B that contains 100 datasets named b1 .. b100

I want to programmatically (*) determine for each dataset pair (ai, bi) whether they have identical content. Since I am comparing 100 pairs, I need an efficient comparison method. Ideally, the whole comparison would take only a few seconds.

(*) Note: I mention the term "programmatical", not because I am looking for code examples (though I'd gladly accept them), but in order to emphasize that I am looking for a very speedy comparison method, which would never be possible when comparing 100 dataset pairs manually.

I am planning to implement this comparison method myself, so what I am essentially looking for is an algorithm, and not a ready-to-use tool (unless perhaps when it's open-source).

I am aware that I am likely asking the impossible, since this would require comparing the datasets' complete contents (perhaps with the tools in the Data ManagementData Comparison toolset); or at least comparing dataset hashes / digests, but generating digests would also require going through all of the datasets' data first.

Therefore my best approach so far is the following:

  1. Determine first which dataset pairs (ai, bi) cannot possibly have identical content.

  2. Perform a full data comparison only for the remaining dataset pairs.

My questions:

  • Does ArcGIS happen to auto-compute some kind of dataset digest that I could query? If so, how?

    (I am not aware of anything of that sort, so I expect the answer to be "no". Please prove me wrong.)

  • What are some very efficient, reliable ways of determining whether two datasets cannot possibly have identical content?

    (I have so far considered comparing modification timestamps, though I don't know how reliable these are, and comparing the datasets' schemas. How reliable are timestamps in an ArcGIS geodatabase? Are there other dataset characteristics that could serve for this purpose?)

stakx
  • 1,161
  • 1
  • 11
  • 31

1 Answers1

2

You could create a field on the feature attribute table and compute a hash, e.g. MD5, on the feature using IEditEvents or a class extension.

The hash would be computed on a string representation of the feature (either json or xml), where WKT could be used for the shape field.

Kirk Kuykendall
  • 25,787
  • 8
  • 65
  • 153