We're experiencing periods where our tempdb log grows until it fills up the disk. It's normal operating size is about 2 GB, but every once in a while it will grow past 100 GB.
Here are the facts:
- Tempdb is by default set to simple recovery and is supposed to truncate when the log exceeds 70% full.
- In our case it will blow right thru that 70% and proceed to begin growing the log
- The problem is intermittent. We have some success in reproducing the problem by running fullscan stats update along with a less than ideal proc that uses a overly simplistic split function that creates a table variable for the output.
- The system is an VMWare 8 core virtual machine. There is currently only one tempdb file. However, we've resisted adding more cores as tempdb contention is not evident (used the query here: http://www.sqlsoldier.com/wp/sqlserver/breakingdowntempdbcontentionpart2
- Manually checkpointing tempdb does not seem to help the issue
To be clear the proc is a monstrosity - it splits a CSV varchar and returns a table variable. Outside of not using one of the recommended split implementations, it is called in a loop which ends up peppering tempdb with new table variables. However, sometimes the problem will occur and other times we can run three of these procs at a time and the tempdb log will truncate and recycle as it should, never growing out of it's 2GB.
The only reason I've seen where the tempdb log won't auto-truncate is when the IO is under such load that checkpoint won't run. But so far, no io metrics show such a load - disk queue stays low, IO utilization is only about 20%.
We're resolving the issue with this proc, but I'd like evidence of a reason on why the log isn't truncating
. Finding a particular wait, identifying that the system at least tried to checkpoint and/or seeing that it's delayed, etc. We've spent a lot of time finding this proc but it would be nice to finish the lesson as what we should be looking for and why.