0

Preface this with the following: I am not a DBA. I see this is a fairly common issue, but everything points back to the transaction log backups. I've been tasked with something well outside my wheelhouse.

I set up our SQL DB (Roughly 20gb in size,) in FULL recovery mode with a nightly FULL backup that occurs every weekday at 4:30AM. This task checks integrity, reorganizes index, updates statistics, history cleanup, backup the full db, shrinks the db, and then maintenance clean up.

I then set another subplan for Transaction Logs. This backs up the Transaction Log every 3 hours during business hours (8am, 11am, 2pm, 5pm), also during weekdays. There are not that many transactions that occur really; in the years I've been here the actual DB has increased less than a gb.

The .LDF file for the server exploded over night on Saturday and with no monitoring or warning, the server is now at capacity. This had been running fine from the looks of the data usage report (and random eyeballing) for months. It looks like shortly after the backup started, at 4:36AM, the transaction log file kept growing to the point where it is sitting at 90gb, with a whopping 5MB free on the drive.

My initial thought was to move one of the older db backups off the server to alleviate space constraints; This worked for a few minutes before the .LDF file gobbled that up also.

Is the only real option I had to switch the DB to SIMPLE, shrink the transaction log, and then swap to FULL, take a backup, and resume where I left off? I attempted to do a compressed transaction log backup to the other drive on the machine and it still would not allow me to do a DBCC SHRINKFILE. It errored out saying the transaction log for the DB is full due to 'log_backup'. I attempted three transaction log backups and trying the DBCC SHRINKFILE each time.

In my testing environment the FULL->SIMPLE->FULL method seemed to work, but I am unsure why, or how this happened to begin with. Am I missing something? I realize that this method is not ideal since it can break the log chain..

Thanks!

Steve
  • 1
  • Every 3 hours isn't very frequent for a transaction log backup, if I am honest, and doing it only inside business hours isn't great either. You could lose up to 15 hours of data (if a failure were to occur at 07:59~ in the morning). Transaction log back ups tend to be far be frequent; perhaps every 10-15 minutes at most. As for why it got so big, I assume you have so kind of overnight processes that caused it, and as you don't backup for 15 hours, then it had plenty of time to grow. – Thom A Nov 23 '20 at 20:27
  • @Larnu The 'approved' schedule was 3 hours, as transactions are not that frequent, from my understanding. There is no work performed outside of business hours, and from my schedules I have nothing happening nightly; Simply the two backups at 4:30 and the 8/11/2/5 schedule.

    As I said, I'm no DBA, but this was the logical conclusion that made sense. Should transaction logs be non-stop? Given the hypothetical that something with the DB Backup at 4:30 caused the transaction log to flood, would earlier logs have helped?

    – Steve Nov 23 '20 at 20:32
  • Some of your maintenance tasks will fill the logs. One option is to switch the recovery model to simple before your maintenance window (just after your last log backup for the day), do your non-backup maintenance tasks followed by switching back to full recovery and doing a full backup. – Brian Nov 23 '20 at 20:57
  • @Brian Which 'maintenance tasks' are you referring to? I have everything running during the 4:30AM FULL backup, to my knowledge. Shouldn't this have been adding up for months then? Why did it suddenly grow this morning? – Steve Nov 23 '20 at 21:01
  • Are you the ONLY one managing or creating maintenance plans? Does anyone else write and execute queries against any of the databases? Do NOT assume - go find out if you don't know. – SMor Nov 23 '20 at 21:04
  • @SMor I am the only one that has set up maintenance plans. The software that uses the database uses the SA account to write/query against the main DB. – Steve Nov 23 '20 at 21:08
  • @Steve All of the "This task checks integrity, reorganizes index, updates statistics, history cleanup ..... shrinks the db, and then maintenance clean up" besides doing the full backup. Often scripts will trigger an index reorganize only when it is past a certain level of fragmentation. – Brian Nov 23 '20 at 21:08
  • @Brian All that maintenance is happening with the 4:30AM FULL backup. It begins at 4:30AM. I'm not sure why this would have suddenly blown up the database's size if it's been doing the same thing consistently for months. – Steve Nov 23 '20 at 21:09
  • @mustaccio I've seen that post, thanks for the attempt. As I stated, we have a transaction log schedule setup for weekdays, so I'm not sure why this would happen.

    I'm wondering if transactions were performed over the weekend and this could have caused this issue? I'd be hard pressed to imagine the amount of transactions needed to fill 90gb worth of storage, though. Unless I'm not understanding something?

    – Steve Nov 23 '20 at 21:21
  • @Steve I think your main question is what could have caused this massive influx of transactions (and thus log file growth) on your mostly quiet system, is that right? Unless you have some monitoring setup, and it sounds like you don't since this is outside your wheelhouse, there's no way to know retroactively. You could start logging sp_WhoIsActive to a table to try and identify what's beating up the database. Unfortunately the folks here won't be able to help until you can gather some more data. – Josh Darnell Nov 23 '20 at 21:36
  • @JoshDarnell Ultimately it was my hope that someone would have an idea; I realize what you're saying.

    I guess at this point the question is really what to do to recover the DB: Add space -> Backup Log -> Attempt Fileshrink, or Swap to SIMPLE -> Fileshrink -> Swap to FULL -> Full backup to get back to square one?

    – Steve Nov 23 '20 at 21:43

0 Answers0