I'm working on a tool to replicate data from several different source DBs (used by the same company, but different vendors and versions) to one centralized analysis platform (for the purpose of this question: just another database). We don't need history, only snapshots of selections of tables/schemas in the source databases. These are all production databases with a lot of updates in each window between replications.
The choice we have to make is whether to do incremental updates via transaction logs or just read the data from the actual tables. We can't change anything in the source databases, so things like adding a last_replicated timestamp are out of the question.
When searching for best practices, I found a lot of tools, but not much info about which method would be most efficient in which circumstances. My gut says just taking a snapshot and throwing away the older replication when the newer is done is the easiest way to do it, and implementing incremental updates through the transaction logs of different vendors is probably a pain and not that much faster per run. This would of course depend on the number of updates on the source database. But I can't find info to back this up.
Can anyone point me to relevant info?
(I am not looking for tools, since this functionality would be part of a suite we are building in-house.)