Given:
- a geodatabase A that contains 100 datasets named a1 .. a100
- a geodatabase B that contains 100 datasets named b1 .. b100
I want to programmatically (*) determine for each dataset pair (ai, bi) whether they have identical content. Since I am comparing 100 pairs, I need an efficient comparison method. Ideally, the whole comparison would take only a few seconds.
(*) Note: I mention the term "programmatical", not because I am looking for code examples (though I'd gladly accept them), but in order to emphasize that I am looking for a very speedy comparison method, which would never be possible when comparing 100 dataset pairs manually.
I am planning to implement this comparison method myself, so what I am essentially looking for is an algorithm, and not a ready-to-use tool (unless perhaps when it's open-source).
I am aware that I am likely asking the impossible, since this would require comparing the datasets' complete contents (perhaps with the tools in the Data Management → Data Comparison toolset); or at least comparing dataset hashes / digests, but generating digests would also require going through all of the datasets' data first.
Therefore my best approach so far is the following:
Determine first which dataset pairs (ai, bi) cannot possibly have identical content.
Perform a full data comparison only for the remaining dataset pairs.
My questions:
Does ArcGIS happen to auto-compute some kind of dataset digest that I could query? If so, how?
(I am not aware of anything of that sort, so I expect the answer to be "no". Please prove me wrong.)
What are some very efficient, reliable ways of determining whether two datasets cannot possibly have identical content?
(I have so far considered comparing modification timestamps, though I don't know how reliable these are, and comparing the datasets' schemas. How reliable are timestamps in an ArcGIS geodatabase? Are there other dataset characteristics that could serve for this purpose?)