11

Everytime I restart my server, the database is always in recovery mode, and it takes about 20 minutes for it to behave as normal. This always and only happens when I restart the server, so I have a few questions...

  1. I was told this could be caused by a large log file? Could that be correct? If not then what could be the other causes?
  2. I need to lower the space of the log file to prevent recoveries. What is better: shrinking or truncating?
  3. How can I shrink or truncate a log file / database to lower the size? What is the syntax?

I am currently using Microsoft SQL Server 2008.

  • Do you tend to have large transactions in flight when you shut down? What is the recovery interval set to? – Martin Smith Sep 03 '12 at 14:42
  • no actions are performed 20 minutes before server restart, other than select statements, the interval is set at 0. –  Sep 03 '12 at 14:55
  • How frequently do you re start? How frequently do you backup the database? I am wondering why you are re starting the server on a regular basis? To be complete, you can manually checkpoint a database (which clears the log) if need be. – Lynn Langit Sep 03 '12 at 16:13
  • "To be complete, you can manually checkpoint a database (which clears the log) if need be." how can that be done? and when you say clear a log, do you mean, not to use or simply wiping the log? –  Sep 03 '12 at 16:34
  • Not enough information. Recovery model? Are you using features like mirroring or replication? Size of the database and files involved? Does the database handle any large transactions? – Jon Seigel Sep 04 '12 at 01:41
  • Another question, how large is your log file and did it reach that size by auto growing? If so, with default settings? If not what settings are you using? – Edward Dortland Sep 04 '12 at 03:31
  • Additionally, from the sql error log, can you tell us how long the redo phase took and how long the undo phase took? – Edward Dortland Sep 04 '12 at 03:50
  • @Lynn Langit, Checkpointing doesn't clear the log by definition. Only in simple recovery model are inactive VLFs cleared. In any case, if there is an active transaction, you could checkpoint/truncate/clear all you want but luckaly nothing will happen ;-) .Furthermore, Checkpointing could help in any redo delays, but this problem could also be caused by an undo delay. (Most likely). – Edward Dortland Sep 04 '12 at 16:56

3 Answers3

6

I have the same issue and I believe I have resolved it but I have not been able to fully test it to confirm.

I believe the issues is related to the number of VLFs you have in your log file and not its size. If you have a large logfile it is likely that it grew organically through auto growth events and that it was not an intentional planned growth. If that is the case the you might have thousands of VLFs inside log files.

Here is a query to see how many VLFs you have that I used from here:

    Create Table #stage(
    FileID      int
  , FileSize    bigint
  , StartOffset bigint
  , FSeqNo      bigint
  , [Status]    bigint
  , Parity      bigint
  , CreateLSN   numeric(38));

Create Table #results(
    Database_Name   sysname
  , VLF_count       int 
);

Exec sp_msforeachdb N'Use ?; 
            Insert Into #stage 
            Exec sp_executeSQL N''DBCC LogInfo(?)''; 

            Insert Into #results 
            Select DB_Name(), Count(*) 
            From #stage; 

            Truncate Table #stage;'

Select * 
From #results
Order By VLF_count Desc;

Drop Table #stage;
Drop Table #results;

For a further explanation of what VLFs are see this link.

I believe the issue is that with so many VLFs it takes SQL server a long time to assess their state and then bring the database out of recovery. If you shrink your log file to the smallest size you can, often the size of the first VLF that was created in the log file, then you can immediately intentionally grow it again and thereby have it create the right number of VLFs (something less than 16).

Once this is complete I believe you will be able to see that your database comes out of recovery much faster.

I have not had a chance to test fail over of our production instances after I resolved our own VLF issues so I would be very curious if you can confirm this is the root cause of the issue. Experimentally I have seen the time it takes to come out of restoring in our staging environment dramatically reduced due to this so hopefully that is it.

Chris Magnuson
  • 363
  • 4
  • 11
2

From this MSDN article:

Long-running uncommitted transactions increase recovery time for all types of checkpoints.

It is generally not recommended to run any kind of DBCC shrinkfile on production databases. Also log truncation behavior HAS changed from earlier versions to 2008 (thanks @Edward) - per this blog:

Backup log with trucate_only is no longer supported in SQL 2008. If your database is in bulk-logged or full recovery model then schedule T-Log backup on regular interval and it will keep your t-log is shape.

Again, I will mention, how frequently do you back up the database? Typically, regular backups "manage" log size best.

Lynn Langit
  • 229
  • 1
  • 3
0

Reducing the size of the online transaction log can fix the problem, i.e. speed up the database getting online, but you should think about the disaster recovery before you do that. Note that if you're in Simple recovery model, you will not be able to restore to a point in time. On the other hand, if you're in FULL recovery model, the best way to keep the size of the online transaction log is to create transaction log backup on regular bases (schedule it).

Truncating the transaction log doesn’t free physical hard disk space, it only allows the SQL Server to reuse that space for transactions that have occurred since the last CHEKPOINT (since the last transaction log backup).

If you shrink the database, you'll reduce the size of the files. To shrink the MyDB database by 15 percent:

DBCC SHRINKDATABASE (MyDB, 15); GO

Carol Baker West
  • 1,616
  • 14
  • 13