5

We have tried the below approach on monthly basis.

The table size is around 2TB, 30 million rows, containing 50 columns so we started with renaming the old table and partitioning the new table in a batch of 10K rows. For 10 million rows it took around 7 hours.

We are looking out for any fast and effective approach which can be done in minimum downtime? Any help is appreciated.

Aaron Bertrand
  • 180,303
  • 28
  • 400
  • 614
Shitij
  • 51
  • 2

1 Answers1

14
  • create a new partitioned table with enough partitions to happily divide up all the existing and some-reasonable-time-into-the-future future data
  • create a view that union alls the two tables (may be simplest to rename the current table and replace it, temporarily, with a view having the old table name)
  • direct new writes to the new, partitioned table
    • hopefully you control writes via a stored procedure or minimal places where you change the target name
    • if not, you can use an instead of trigger on the view to direct writes to the partitioned table
  • in the background, start delete top (@batchsize) output deleted.* into new_table from old_table
    • this doesn't matter how long it takes to get everything done, the trick is to optimize the batch size so it's a balance of getting things done and not causing blocking for too long, and to make sure you put in some log backups between every n batches if they're not already scheduled frequently enough (some more info on that here)
  • once all the data is backfilled, you can drop the old table, then change the view to no longer union all (or get rid of it and rename the new table)

If it takes two weeks to backfill all the old data into the new partition scheme, so what? Users aren't waiting for two weeks; they're only ever waiting on any individual batch (and even then, that'll largely just be the folks querying the data that's moving over, not the new data).

Aaron Bertrand
  • 180,303
  • 28
  • 400
  • 614