2

I'm dealing with a database whose plan cache seems to be periodically clearing out by itself. As an example, in the span of about 2 minutes, the SQL plan cache went from 12.5k MB to 3.1k MB. At this time and shortly after, the number of compilations shot up dramatically as well. The rest of the cache shrinks as well, with everything seeming to get smaller by a large amount.

I've looked through the default trace and don't see anything obvious. At first, I thought it was related to a linked server due to a correlation of some SHOW_STATISTIC calls coming from it during the same minute as when the cache cleared out, but it seems that that's what happens when a linked server executes a query across the way as it needs statistics to use. I don't see any DBCC FREEPROCCACHE calls in the default trace logs for any of these events either.

I have the transaction log that covers the timeframe when one of these events happened, but I'm still unfamiliar with navigating it.

I haven't seen any evidence to suggest it's one of the ALTER DATABASE or sp_configure commands that will clear the cache either. There are no events in the SQL Server Log that happen at the time of this clearing out.

I've noticed in SQL Sentry that a number of times that this happened, another process on the SQL box was taking up ~10% of CPU, but it's not during all of the events so I'm not certain if that's worthwhile to look into for this.

At this point, I suspect that a lot of data was changed during each of these events, and the resulting statistic changes made many plans in the cache obsolete, so they were kicked out. I'm not certain how to best check this though, without bogging the system down with an expensive trace. Is there an easy way to determine this through the default trace and transaction log?

Another factor in this is that some of the queries have over a minute (!) compile time, and are requesting multiple GB of memory per execution, while using a small fraction of that. I suspect these are all related problems, but am trying to determine the cause of the cache clearing to minimize the minute compilation issues.

In summary: * Plan cache is dropping considerably in the span of a minute. * The log doesn't contain any mention of commands that will clear the cache. * I have the transaction log and default trace contents for the timeframe of a few of these events.

This is on SQL Server 2016 SP1 CU2, running in 2012 compatibility mode. Version: 13.0.4001.0. The server has 300 GB of Memory available. I haven't seen the cache get much larger than 14 GB, but most of the time it is about 12.5 GB.

Any ideas on where to go from here on how to determine what's causing this to drop?

Erik Darling
  • 40,781
  • 14
  • 130
  • 456
  • 1
    I've heard of many cases of this, but no real lockdown on the cause. One of the Microsoft devs said to check sys.dm_os_memory_cache_clock_hands and sys.dm_os_ring_buffers after this occurs to see if plans were evicted due to memory pressure. And this thread suggests that possibly Query Store is involved (do you have query store enabled?). What is the other process that is taking up CPU on the box? Can you move that to another machine? – Aaron Bertrand Feb 09 '17 at 22:32
  • Other weird cause: min memory being set to the same value as max memory – Erik Darling Feb 09 '17 at 22:39
  • @sp_BlitzErik, min server memory is set to 0 (16 MB running value) and 315000 MB max for both set and running. – The Great Laughter Feb 09 '17 at 22:54
  • @Aaron, Query Store has the operation mode of Off, and has 0.0 MB used up. I thought I had enabled it at one point, but apparently not. LIPM comes back as not enabled. I've queued those DMVs up to check the next time this happens. Thank you! – The Great Laughter Feb 09 '17 at 22:58
  • You may also want to try checking the plan cache to see what's in there after a clear-out. It may be large memory grants or something. Since you're on a newer version of SQL, try running sp_BlitzCache (available here: firstresponderkit.org) with @SortOrder = 'memory grant'. Full disclosure: I contribute code to this free script and occasionally get drunk with the guy who owns Brent Ozar Unlimited, and employs me. – Erik Darling Feb 09 '17 at 23:08
  • FWIW I don't think the theory is that query store takes memory or that its storage has some impact, it's that it triggers something in SQL Server that forces it to clear the plan cache. – Aaron Bertrand Feb 09 '17 at 23:23
  • @sp_BlitzErik, BlitzCache was actually how I noticed this, thanks to the "100% of plans were created in the past 24 hours, 96% in the past 4 hours" warning. I didn't know about the memory grants sort order, so I'll give that a shot. I have noticed that some queries think they'll need ~9 GB of memory, but only use ~150 MB, and wondered if that could be part of the problem. – The Great Laughter Feb 10 '17 at 14:22
  • @TheGreatLaughter glad you found it helpful! Let me know what turns up. BlitzCache should warn you about (large) unused memory grants. – Erik Darling Feb 10 '17 at 14:31
  • @sp_BlitzErik the tools in general have been a lifesaver! Sorting by Memory Grant shows one query with a memory grant of 57 GB (max used grant KB 10,600) and 4 others that are at 14-15 GB max memory grant. And then a handful of them at 6 GB memory grant and 213 MB used. So this seems like a worthy avenue to pursue! Thank you! – The Great Laughter Feb 10 '17 at 15:01
  • @AaronBertrand, as luck would have it, I got into work, ran those DMVs, then checked to see if any of the events happened and found out it happened a minute before I ran the DMVs. I'll now be researching what to look for, but I believe the answer will be in there. The Performance Counter History report in SentryOne points out that the object plan cache spiked at the same time the SQL Plans plummeted, which I think may be related to the weird grant issues I'm seeing. – The Great Laughter Feb 10 '17 at 15:03
  • Following up on this, I've been able to reproduce this on relative-demand by waiting until the cache is 15 GB large, and then running a query that requires a lot of memory. I actually can reproduce it by running sp_blitzcache, but I'm convinced it's not an issue with that stored procedure itself, but with the overall issue of SQL server seeming to cannibalize the plan cache first when it needs memory. – The Great Laughter Feb 13 '17 at 20:31

1 Answers1

1

on SQL2014 SP1 changing sp_configure parameters seemed to solve problem :

sp_configure 'show advanced options', 1;
GO
RECONFIGURE;
GO
exec sp_configure 'access check cache bucket count',512;
exec sp_configure 'access check cache quota',2048;
GO
RECONFIGURE;
GO
sp_configure 'show advanced options', 0;
GO
Kin Shah
  • 62,225
  • 6
  • 120
  • 236