6

I'm looking for recommendations on how should I set up transaction logs for databases that are used by a development team. These databases are ephemeral in the sense that we never care about data recovery in case of hardware/software failure. Instead, every time a developer starts a task, they will create a new database and fill it with data from scratch anyway, so they'd do so as well in case of hardware failure. Another use case is automated testing, where a new database is created for each test run.

Right now, due to the usage patterns of developers (testing different types of queries, frequent data bulk-loading), the logs grow a lot and are more of a hindrance than help. We've seen cases where the logs started taking half a terabyte after only an hour of the developer's work, forcing them to manually truncate the logs. As we don't want to manually truncate the logs during automated tests, we need to allocate larger machines for them. I suspect that more I/O is required, slowing down operation.

Any recommendations I could find in both the SQL Server documentation and other materials are for production server and focus on data recovery, which is exactly the opposite of what I'm looking for.

What are some good practices regarding configuration of transaction logs of SQL Server when data recovery is irrelevant, instead ease of operation, resource usage and raw speed are a bigger concern?

liori
  • 289
  • 1
  • 9
  • 1
    "resource usage and raw speed are a bigger concern?" ALTER DATABASE ... SET DELAYED_DURABILITY = FORCED – Igor Feb 25 '21 at 12:49
  • 1
    @Igor, good idea! Can you add it as a proper answer? Comments are… ephemeral on this site ;-) – liori Feb 25 '21 at 12:57
  • 2
    "second trouble: size transaction log"

    This complex question.

    Example answer (Top reason 2/2: Long Running Transactions): https://dba.stackexchange.com/questions/29829/why-does-the-transaction-log-keep-growing-or-run-out-of-space

    – Igor Feb 25 '21 at 13:39
  • 1
    Another example decrease in growth TLOG (Break large delete operations into chunks): https://sqlperformance.com/2013/03/io-subsystem/chunk-deletes

    P.S. this complex question :-) (Takeaway: When performing long-running modifications, I’m sure many of you enjoy using batches to increase concurrency. But I want to talk about a pitfall to be aware of. If you’re not careful, the method you use to implement batching can actually worsen concurrency.) https://michaeljswart.com/2014/09/take-care-when-scripting-batches/

    – Igor Feb 25 '21 at 14:04
  • 1
    BTW i can understand to reduce resource usage and speed up development databases, but keep in mind this encourages inefficient write patterns. So there should be performance characterizations being done on properly logged databases early in the development cycle. – eckes Feb 28 '21 at 10:30
  • 1
    @eckes, indeed we do make sure this doesn't happen. I was intending to apply recommendations here to our development servers, where we attempt to get things right, but not staging servers used for performance review. – liori Mar 01 '21 at 13:02

3 Answers3

6

Simple recovery model doesn't mean that the transaction log isn't used.

It just means that SQL Server can empty the log (aka "truncate" the log), instead of having to do it yourself scheduling regular transaction log backups. I.e., the log need space for the largest/earliest open transaction, since the log cannot be truncated past the oldest transaction - regardless recovery model.

Simple does mean, however, that some operations can be performed in minimally logged form. Such include create/alter/drop index, SELECT INTO, bulk loading of data and if the stars are properly aligned also INSERT ... SELECT.

So, if simple doesn't cut it for you regarding space usage, then you'd have to look into how the operations are done - i.e., educate the developers about minimal logging as I mention above.

As for speed, minimally logging will help there as well, since you have less I/O. Larger transactions (within some reason) means less synchronous I/O to the LDF file. From a pure config aspect, I guess some RAM disk option for the ldf file could help - but that might not be doable if you need large ldf files.

Tibor Karaszi
  • 17,101
  • 2
  • 14
  • 26
5

Question: "What are some good practices regarding configuration of transaction logs of SQL Server when data recovery is irrelevant, instead ease of operation, resource usage and raw speed are a bigger concern?"

IMHO:

ALTER DATABASE ... SET DELAYED_DURABILITY = FORCED

Control Transaction Durability

How to control transaction durability

Database level control

You, the DBA, can control whether users can use delayed transaction durability on a database with the following statement. You must set the delayed durability setting with ALTER DATABASE.

ALTER DATABASE ... SET DELAYED_DURABILITY = { DISABLED | ALLOWED | FORCED }  

DISABLED [default] With this setting, all transactions that commit on the database are fully durable, regardless of the commit level setting (DELAYED_DURABILITY=[ON | OFF]). There is no need for stored procedure change and recompilation. This allows you to ensure that no data is ever put at risk by delayed durability.

ALLOWED With this setting, each transaction's durability is determined at the transaction level - DELAYED_DURABILITY = { OFF | ON }. See Atomic block level control - Natively Compiled Stored Procedures and COMMIT level control -Transact-SQL for more information.

FORCED With this setting, every transaction that commits on the database is delayed durable. Whether the transaction specifies fully durable (DELAYED_DURABILITY = OFF) or makes no specification, the transaction is delayed durable. This setting is useful when delayed transaction durability is useful for a database and you do not want to change any application code.

Igor
  • 720
  • 1
  • 7
  • 15
  • 2
    SQL SERVER – Delayed Durability, or The Story of How to Speed Up Autotests From 11 to 2.5 Minutes https://blog.sqlauthority.com/2016/07/20/sql-server-delayed-durability-story-speed-autotests-11-2-5-minutes/ – Igor Feb 25 '21 at 14:37
  • 1
    OP is using Delayed Durability for ephemeral data. For those using Delayed Durability for regeneratable data, be aware that the data may need to be regenerated after a clean shut-downs, not merely in response to crashes. – Brian Feb 25 '21 at 23:00
  • Docs reference for what @Brian points out, which is frankly amazing to me - you'd think they would make calling sp_flush_log part of a clean shutdown, right?! – AakashM Mar 05 '21 at 10:38
4

Sounds like a good candidate for the Simple Recovery Model.

Automatically reclaims log space to keep space requirements small, essentially eliminating the need to manage the transaction log space.

Here's a good article that discusses the uses of the different recovery models and then discusses doing shrink operations. Generally I don't recommend shrinking in a SQL Server instance, but in your scenario where disk space is crucial and database usage is temporary and perhaps various since it sounds like you do a lot of testing, then maybe this is a use case that is acceptable for running a shrink operation.

Traditionally, shrinking is recommended against because it's a heavy operation that's generally a waste to run since the Transaction Log is just going to re-grow to the size it needs anyway (and growth operations are also not so light). But in your case where it sounds like you scale up new databases regularly and do various testing, it may make sense to shrink during a maintenance window, to reclaim some of that disk space.

Additionally you can look into enabling auto-shrink on your databases, but again this highly recommended against because it can affect performance throughout the day while your developers are working. In my opinion, if you're going to shrink, it should be a controlled and planned operation that's done during a maintenance window that's minimally disruptive to your developers. Here's another article that goes through how auto-shrink works. And last but not least, here's a good Brent Ozar article on the drawbacks of auto-shrinking.

J.D.
  • 37,483
  • 8
  • 54
  • 121
  • Hmmm… I believe we're already using this setting, yet the transaction logs grow? – liori Feb 25 '21 at 12:14
  • @liori With any of the settings the Transaction Log will have to be used, even in a minimal capacity, but please see my updated answer with what else you can try. – J.D. Feb 25 '21 at 12:21
  • @liori, the log still needs to be sized to accommodate the largest transaction and not yet replicated data. It may grow as a result but will eventually be truncated, – Dan Guzman Feb 25 '21 at 12:22
  • Ok. So I still need to somehow decide on the transaction log size for each database? I hoped there is some setting that would shrink it automatically as soon as possible. I've had scenarios where the logs started taking half a terabyte after an hour, and the developer had to manually shrink it every hour or so. – liori Feb 25 '21 at 12:29
  • @liori If the database automatically ran a shrink command at will, how would it know when is the right time to shrink? If it ran the operation right when your log was at half a terabyte and then the next query a developer ran caused the log to re-grow to half a terabyte, then those two operations of shrinking and growing back to back will be very heavy and affect performance of your server / database during them. Since performance is a factor in your requirements, it sounds like it actually would be better to just let the log grow in that case and again... – J.D. Feb 25 '21 at 12:43
  • ... shrink off-hours during a maintenance window, perhaps every night or during a break during the day that's minimally disruptive. Instead of having the database constantly putting pressure on itself from repeatedly shrinking while your developers are trying to work. – J.D. Feb 25 '21 at 12:44
  • That being said, SQL Server does have an auto-shrink feature, it's just highly recommended against for the aforementioned reasons. I'll update my answer with some information. – J.D. Feb 25 '21 at 12:47
  • As far as I understand, logs growing to half a terabyte do not mean a single transaction required half a terabyte. It might have been a thousand transactions requiring half a gigabyte each. But if you say there's no automated means to keep the transaction logs at sane sizes without having to estimate the necessary amount manually per database, then I'll accept that. – liori Feb 25 '21 at 13:00
  • @liori It really doesn't matter what the size of the individual transaction was, that was more so an example. Even in your example though, if it was 1,000 transactions that were each 1 GB and then a shrink is ran and then within an hour another 1,000 transactions run that cause it to re-grow to half a terabyte again, the same idea applies that 1) there's not a great way for SQL Server to determine when to automatically run a shrink command to most optimally reduce disk space but minimize affecting your developers, 2) you still end up shrinking and re-growing to where you... – J.D. Feb 25 '21 at 13:12
  • ...were an hour ago, so affectively didn't accomplish much but still took the performance hits of those operations in an hour. Since Transaction Log space is recycled and re-used internally by SQL Server anyway, it's usually irrelevant that the space on disk is consumed, it'll be re-used internally by SQL Server for future transactions. The growth of the Transaction Log file isn't continuously cumulative. It is possible to enable auto-shrink on a database, it's just highly recommended against. Please see my re-updated answer with information / resources on how to do so. – J.D. Feb 25 '21 at 13:13
  • 1
    Thank you! It's clear now! – liori Feb 25 '21 at 13:25
  • @liori No problem! – J.D. Feb 25 '21 at 14:25