TempDB log grows without bounds

We're experiencing periods where our tempdb log grows until it fills up the disk.  It's normal operating size is about 2 GB, but every once in a while it will grow past 100 GB.

Here are the facts:

- Tempdb is by default set to simple recovery and is supposed to truncate when the log exceeds 70% full.
- In our case it will blow right thru that 70% and proceed to begin growing the log
- The problem is intermittent.  We have some success in reproducing the problem by running  fullscan stats update along with a less than ideal proc that uses a overly simplistic split function that creates a table variable for the output.
- The system is an VMWare 8 core virtual machine.  There is currently only one tempdb file.  However, we've resisted adding more cores as tempdb contention is not evident (used the query here: http://www.sqlsoldier.com/wp/sqlserver/breakingdowntempdbcontentionpart2)
- Manually checkpointing tempdb does not seem to help the issue

To be clear the proc is a monstrosity - it splits a CSV varchar and returns a table variable.  Outside of not using one of the recommended split implementations, it is called in a loop which ends up peppering tempdb with new table variables.  However,  sometimes the problem will occur and other times we can run three of these procs at a time and the tempdb log will truncate and recycle as it should, never growing out of it's 2GB.

The only reason I've seen where the tempdb log won't auto-truncate is when the IO is under such load that checkpoint won't run.  But so far, no io metrics show such a load - disk queue stays low, IO utilization is only about 20%.

We're resolving the issue with this proc, but I'd like evidence of a reason on why the log isn't truncating.  Finding a particular wait, identifying that the system at least tried to checkpoint and/or seeing that it's delayed, etc.  We've spent a lot of time finding this proc but it would be nice to finish the lesson as what we should be looking for and why.
LVL 11
b_levittAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Anthony PerkinsCommented:
Tempdb is by default set to simple recovery and is supposed to truncate when the log exceeds 70% full.
That is not the way it works.  Tempdb can only use Simple Recovery Model and you cannot configure it to TRUNCATE with any percentage.  That will happen when it is done with the specific process.
0
Goodangel MatopeSoftware ArchitectCommented:
0
b_levittAuthor Commented:
@Anthony Perkins

I understand it can only use simple recovery.  That's why I said "by default",  I was trying to head off any "make sure it's in simple recovery" comments.

I did not configure it to truncate.  Databases in simple recovery mode autotruncate when they are 70% full.  Tempdb is no exception.

I do have additional data though and for now it may be enough to satisfy me:  sys.database showed the log_reuse_wait reason was ACTIVE_TRANSACTION.  While it's not clear which transaction was doing it, at least it's good to know that it was a transaction.



E
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
10 Tips to Protect Your Business from Ransomware

Did you know that ransomware is the most widespread, destructive malware in the world today? It accounts for 39% of all security breaches, with ransomware gangsters projected to make $11.5B in profits from online extortion by 2019.

Scott PletcherSenior DBACommented:
First, replace the bad splitter function.  Delimited8KSplit is readily available; I can give you the code if needed.

Next, FULLSCAN stats updates for very large tables should be run one-at-a-time if possible, rather than multiple at once.

Also, look for snapshot isolation being set on for a large db, particularly during index rebuilds and other maintenance, as that can cause lots of tempdb use.  Replication could also do that, although it's usually less likely, since you'd probably be aware of that.

Finally remember that any large sort will use tempdb, so very large sorted queries, reports, etc., could also use a lot of tempdb.
0
Anthony PerkinsCommented:
I understand it can only use simple recovery.  That's why I said "by default",
I guess my definition of "by default" and yours are different, but at least we agree there is no other option.

Databases in simple recovery mode autotruncate when they are 70% full.  
Are you referring to this:
Also, under the simple recovery model, an automatic checkpoint is also queued if the log becomes 70 percent full.
The part you may be overlooking is in bold and the part you may be overlooking is:
Under the simple recovery model, unless some factor is delaying log truncation, an automatic checkpoint truncates the unused section of the transaction log.

While it's not clear which transaction was doing it, at least it's good to know that it was a transaction.
Have you checked DBCC OPENTRAN() ?
0
b_levittAuthor Commented:
I guess I'm not sure what you're telling me.  If you're trying to make sure everything is semantically correct, ok. But If you're trying to say that truncation is not guaranteed, i think i recognized that.

The only reason I've seen where the tempdb log won't auto-truncate is when the IO is under such load that checkpoint won't run.  But so far, no io metrics show such a load - disk queue stays low, IO utilization is only about 20%.

As for your other question:
Have you checked DBCC OPENTRAN() ?

Yes but unfortunately log_reuse_wait only shows the reason the trunc failed the LAST time and we never saw one very long transaction.  If we catch it again i'd try this guy's query with sys.dm_tran_* views:
http://www.sqlskills.com/blogs/paul/script-open-transactions-with-text-and-plans/

The fact that it was intermittent also made it difficult.  What we believe is the same senario would result in this condition sometimes but not others.  To be honest it felt like once it missed the automatic truncation at 70% it didn't try again until the system was nearly idle.  In most cases DBCC SQLPERF(logspace) would show the log truncation right at 70%, in the problem case, it would pass it and never come down.
0
Anthony PerkinsCommented:
The fact that it was intermittent also made it difficult.
What you need is something like sp_whoisactive and log the results to a table this way you can look back and find the exact query that was the root cause.
0
b_levittAuthor Commented:
found the log_reuse_wait indicator
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.