2

I am trying to learn how the transaction log works and have been studying it for a couple of day.

Some operations become faster when done as many small transactions, e.g. deleting many rows in small batches. See Methods of speeding up a huge DELETE FROM <table> with no clauses

Other operations, e.g. cursors become faster when one big explicit transaction is wrapped around the cursor. This seems contradictory to me...

Can someone please explain?

Edit with extra info. My table is rowstore with a clustred index and a NC index for the predicate in the delete.

xhr489
  • 805
  • 8
  • 28

2 Answers2

4

As with most things, it depends...

The longer a transaction is open against a set of Tables, the longer those Tables are locked for. Locking on a Table causes contention against other queries trying to read from and write to that Table. So breaking up a transaction into multiple smaller transactions reduces the continuous runtime that those Tables are locked for. It also may better isolate only locking the Tables that need to be absolutely locked for that part of the code.

But a lot of relational problems are best solved in a relational manner (as opposed to iteratively or procedurally), and there is other downtime in creating and committing (or rolling back) multiple transactions and taking out the appropriate locks for those transactions that sometimes a single transaction is best for minimizing.

For example, if you're trying to update 100 rows in a Table, in most cases updating all 100 rows in a single transaction / single UPDATE statement will be a lot quicker than taking out 100 transactions and iteratively looping through the table 100 times, row by row.

Generally the tradeoff is, the query you're trying to run will run the quickest itself with the least amount of transactions, but will potentially cause the most amount of contention on the Tables involved in that query for the rest of the server / other queries accessing those Tables. By breaking the query / logic up into multiple transactions you likely reduce the overall contention for those Tables for the rest of the server but may incur some performance reduction in that specific query that was broken up into multiple transactions.

So the decision to be made will be based off of how important it is for that specific query to run quickly vs the busyness on the rest of the server involving the same Tables that query uses, and how long they can afford to sit waiting to perform their own operations.

J.D.
  • 37,483
  • 8
  • 54
  • 121
  • Hi so how come deletes become faster in small batches? I don't think the part with locking is related to my question. I am asking about isolated performance of operations on tables. Locking and blocking are not part of my question. – xhr489 Feb 10 '21 at 00:33
  • E.g. delete with CTE with top becomes faster using a while loop and @@rowcount. So we have more transactions. But with cursors using one big transaction makes the cursor faster. – xhr489 Feb 10 '21 at 00:36
  • But I see some changes in the answer. So you are saying it is because of the locking overhead? – xhr489 Feb 10 '21 at 00:39
  • @xhr489 Sorry I got called away while writing my answer, so that's why you see updates. Doing a single DELETE on a Table using your CTE as your source will be faster than looping through and doing multiple DELETES (relational logic as opposed to iterative). A cursor is inherently slower than a loop in SQL Server so will naturally usually be slower when running the same logic iteratively with a cursor vs a loop. Cursors are almost always recommended against for notoriously being so underperforming, so that is outside the context of your question and unrelated to... – J.D. Feb 10 '21 at 00:58
  • ...the performance of one transaction vs multiple transactions performance. If you take the same exact query and run it inside one transaction you'll generally see it executes faster than taking that same exact query and breaking it up by multiple transactions. – J.D. Feb 10 '21 at 01:00
  • Hi, doing deletes in small batches is faster than one single delete. I have a lot of experience with this. See also https://dba.stackexchange.com/questions/1750/methods-of-speeding-up-a-huge-delete-from-table-with-no-clauses – xhr489 Feb 10 '21 at 01:25
  • @xhr489 There are other factors at play there. Once you cross a certain threshold of a number of rows, to reduce contention and locking issues on the Table for the rest of the server, doing iterative deletes of batches (which are still large) can help. But the overall runtime of that DELETE statement will almost always be slower in total runtime for an appropriately equivalent execution plan, or negligibly similar. The point stands though, relational solutions are usually most efficient for relational problems, and realistically if you're manipulating anything shy of 100,000... – J.D. Feb 10 '21 at 01:43
  • ...rows at a time (on anything considered relevant hardware these days) then doing a single query for the entire batch will run faster for that individual query then iteratively deleting row by row, especially 1 row at a time (total runtime). (The link you posted, and your past experiences are more complicated cases than you may have realized.) – J.D. Feb 10 '21 at 01:44
  • Also please note that StackExchange post is literally a decade old, for a very deprecated version of SQL Server. – J.D. Feb 10 '21 at 01:54
  • 1
    @xhr489 Here's a recent article written by Aaron Bertrand (one of the most knowledgeable in the SQL community) who ran the tests (eliminating as many variables as possible) on batch deleting vs doing a single delete, and concludes on average it's "always" faster to do a single delete instead of batching. It's a very good article worth reading. Again this is an isolated test and real world results can vary for a number of additional variables, as noted in sql_handle's answer. – J.D. Feb 10 '21 at 03:10
  • I think it depends on the type of disk the log file is on. My experience does not show that, but again maybe it is my log file disk... – xhr489 Feb 10 '21 at 06:45
  • @xhr489 If by type of disk you mean an SSD vs mechanical HDD vs other alternatives such as an NVMe, those are all constants, so they would affect either case the same. I think likely what you've experienced had other variables involved. For example, your original question asks about two completely different cases, a CTE vs a cursor. Those types of features are natively different and have different performance characteristics themselves, ignoring all other factors, such as transactions & different DML queries. In most cases a CTE is lighter & can be faster than a cursor. – J.D. Feb 10 '21 at 12:36
  • Hi again, cte was just a way to implement deletion in small batches using a while loop... one can also implement it without a cte. My question is/was about The Transaction log. E.g. deleting 100 milion rows as a single transaction will fill up the transaction log and possibly auto grow it which makes this a bad option. But doing it in batches will be not grow the transaction log and thus faster.... Anyway my question is about the transaction log... see you at the next question :-) – xhr489 Feb 10 '21 at 20:02
4

If you are using a recent version of SQL Server you can compare session waits per sys.dm_exec_session_wait_stats in the small batch vs one delete comparison. Enabling Query Store and comparing metrics such as logical reads, physical reads, and CPU time among the small batch deletes as well as to the single delete will also be instructive.

Without additional detail, a more thorough answer to your question requires so much speculation it wouldn't be too useful. For example, the question as stated doesn't specify if the table rowstore or columnstore, is partitioned or not, whether the table has a primary key or clustered index if it is a rowstore table, how the delete is broken into small batches, or how the size was chosen to be "small" (was it just a guess or one of many candidates which was tested?).

Aaron Bertrand gives some careful consideration to many of these factors in the following blog post - in which even he seems surprised that storage subsystem characteristics can make a single delete faster than many small deletes in some cases.

Fastest way to Delete Large Number of Records in SQL Server 2019 December 3 https://www.mssqltips.com/sqlservertip/6238/fastest-way-to-delete-large-number-of-records-in-sql-server/

Here's a possible scenario that could explain the difference in performance that you are seeing - but there are many other possible scenarios as well.

It is quite possible that the transaction log is already of sufficient size that it need not grow to log the many small, non-overlapping transactions of a single session performing small batched deletes. At the same time, a single large delete may very well require the transaction log to grow many times to accomodate logging its activity. The transaction log is not allowed to take advantage of Instant File Initialization - if the transaction log auto-grows, it must write to the entire new growth area before the write is completed. For that reason, many small transaction log auto-growths to support a single large transaction could exacerbate writelog waits and substantially increase the duration of a single large delete.

sqL_handLe
  • 1,471
  • 10
  • 14
  • 1
    Right, this communicates a subset of all the complications I refer to in the comments back and forth with OP in my answer. You've touched on a lot of important things that complicate the experiences it sounds like he has had, and hopefully this helps convey to OP that performance comparisons are situational and details specific, moreso that iteratively deletions aren't necessarily faster than single deletes. – J.D. Feb 10 '21 at 02:12
  • Thanks, well then maybe it is my environment because I have seen great results with small batches. And it is not because of auto-growth of the log file, because I have been monitoring it with Redgate. Next time I will give more info about the conditions e.g. indexes, partitioning etc. Anyway thanks. I wait a bit to see other answers. – xhr489 Feb 10 '21 at 02:17