Solved

Very High Current Disk Queue Spikes

Posted on 2007-11-26
13
1,731 Views
Last Modified: 2013-11-14
We have a Managed SQL server 2000 cluster running against a SAN disk set. At occasional peroids we seem to suffer a very high number of SQL Deadlocks and performance issues. I decided to monitor the SQL server and during this peroid I was monitiring Avg Disk Queue Length, Current disk queue length., Disk bytes/sec, SQL transactions/sec and a few others. Now normally the Disk Q sits around .05 to.5 but at certain times I noticed the the current queue length would climb to around 850+ over 25 seconds with no significant increase in bytes/sec or trans/sec. Could this be a SAN config issue? What areas can I investigate considering this is a Managed server and only have restrcited remote access to the server. I can provide a screenshot of the spike. The support company say the stats are irrelvant and the issue is with the application and they can't see anything with the system.
0
Comment
Question by:dusty_nz
  • 6
  • 3
  • 2
  • +2
13 Comments
 
LVL 32

Assisted Solution

by:bhess1
bhess1 earned 40 total points
Comment Utility
Can't say that this applies in this case, but I ran into a very similar symptom recently where some SQL DB files were in a compressed directory.  At least check to ensure that this is not the case for you.
0
 

Author Comment

by:dusty_nz
Comment Utility
Heres a couple of pictures showing the spikes

http://kiwiland.net.nz/download/diskqSpike1.jpg
http://kiwiland.net.nz/download/diskqSpike2.jpg

The second clarely shows that the server is not really under any increased load.
0
 
LVL 12

Assisted Solution

by:kselvia
kselvia earned 20 total points
Comment Utility
Assuming you have enough RAM most of your reads should come from cache.

I would focus on tempdb.  How many tempdb files do you have? Are they on different logical disks?

Do you do a lot of SELECT * into #temptable in your code?

That will put a schema lock on the current tempdb file. If you have only one tempdb file, it will block all other SELECT INTOs until it completes.  At that time, you may see the spikes.

It is best to have one tempdb file for each CPU.

It's also not a good idea to SELECT  INTO for large result sets.

To prevent schema locks on tempdb, either pre-create the temp table and INSERT into it, or populate the temp table in 2 steps.  
1) SELECT * into #temptable from Mytable where 1 = 2 (this will create an empty structure thus avoiding extended schema locks)  
2)  INSERT #temptable SELECT * from Mytable

Or run SQL profiler and capture the events that are occuring when you see these spikes.
0
 

Author Comment

by:dusty_nz
Comment Utility
The database does do some big selects however it has 8 gig ram and max CPU gets to around 10%. all the other perf stats like Cache hits etc are good. Its these occasionaly spikes where it looks like the disk is unable top be written to. This seems to be primarily on the Data drive and the log drive is not impacted.
At other times we can put the system under a lot more load and its happy. My guess is its a SAN config issue??
0
 
LVL 32

Assisted Solution

by:bhess1
bhess1 earned 40 total points
Comment Utility
Raid-5 data drive?  Raid-0, 1, or 10 log drive?

Is there anything that would do a mass Update of a large number of records?

I would not expect the CPU to spike on whatever is causing this, since the disk is orders of magnitude slower than CPU or RAM.

When I see this sort of pattern, I first suspect massive data changes that cause multitudes of records to be rewritten.  What might be causing that, I cannot say, but large updates of clustered indexes, causing mass splitting, might cause this on a large table on a RAID-5 drive set.
0
 

Author Comment

by:dusty_nz
Comment Utility
However we are not seeing an increase in bytes/sec or in transactions/sec. Also we have had way bigger days and nothing like this.
0
Get up to 2TB FREE CLOUD per backup license!

An exclusive Black Friday offer just for Expert Exchange audience! Buy any of our top-rated backup solutions & get up to 2TB free cloud per system! Perform local & cloud backup in the same step, and restore instantly—anytime, anywhere. Grab this deal now before it disappears!

 
LVL 55

Accepted Solution

by:
andyalder earned 45 total points
Comment Utility
Is this SAN dedicated to you or shared with other servers? It looks that the SAN may be slowing down during these spiles but that doesn't mean there is anything wrong with it, just that other servers are hitting it heavily at that time. Is there any guaranteed performance on your LUNs from the hosting centre?
0
 
LVL 35

Assisted Solution

by:David Todd
David Todd earned 20 total points
Comment Utility
Hi,

kselvia:
Another expert has suggested that your comment about select into #someTempTable no longer applies

I've openned a question to discuss this:
http://www.experts-exchange.com/Microsoft/Development/MS-SQL-Server/Q_22988582.html

Cheers
  David
0
 

Author Comment

by:dusty_nz
Comment Utility
Thanks for everyones contribution. The issue was actually that we had SAN issues but wanted a 3rd point of view. The manahging company (Big WA one) had actually build the san that had a multi partitioned Lun. Two partitions containing the db and logs (V Bad) and the third partition containing the exchange server. Also they were running a Hitachi (HS9000?) which actually has issues with resource contention na dis known to lockup when pushed. We were actually seeing points for almost 25 seconds where the SAN was processing no I/Os at all. Causing transactions to stall incompleted and then raise the issue of deadlocks. Putting loggins we actually saw over 20 cases per day when the Disk Qhit over 6000. The managing company resfused to accept they may have a SAN issue.
0
 

Author Comment

by:dusty_nz
Comment Utility
reposted for bad spelling. Sorry
Thanks for everyones contribution. The issue was actually that we had SAN issues but wanted a 3rd persons point of view. The managing company (Big WA one) had actually configured the SQL using a SAN partition sharing both Datana dlogs on the same raidset as well as a very big corporate exchange server. On the same raid set. Two partitions containing the db and logs (V Bad) and the third partition containing the exchange server. Also they were running a Hitachi (HS9000?) SAN which actually has issues with resource contention and is known to lockup when pushed. We were actually seeing periods of almost 25 seconds where the SAN had stopped processing any disk requests at all. This was causing transactions to stall incompleted and then raise the issue of deadlocks. Putting logging in we actually saw over 20 cases per day when the Disk Q hit over 6000. The managing company refused to accept they may have a SAN issue and claimed that table locks were causing disk locks.
0
 
LVL 55

Expert Comment

by:andyalder
Comment Utility
So I was right that you had a disk I/O problem (it's obvious from the 4th spike on your second pic) but you gave me a B grade. I presume the first red spike on that one is you using the mouse to start perfmon. You can see disk writes/sec=0 even though the queue goes up. If it was a BMP I'd draw a circle around the evidence with mspaint and post it on ee upload site.
0
 

Author Comment

by:dusty_nz
Comment Utility
Yeah. Sorry Andy. I felt it was a bit of a trick question and wasn't actually looking for an answer but more getting a broad point of view.

I do appreciate your answer and it was indeed correct however I gave points to everyone. (You got the most by far) Unfortunitly you can only give one grade for everyone.

0
 
LVL 55

Expert Comment

by:andyalder
Comment Utility
It was a really good question and I don't mind about grades anyway, MS could add it to their list of how to interpret perfmon exams. Not really about SQL though.

https://filedb.experts-exchange.com/incoming/ee-stuff/5869-flatline.jpg
0

Featured Post

Google Storage: Standard vs. Nearline vs. Coldline

Google Cloud Storage has a number of classes to choose from. Although there are a lot in common, they vary in price and usage terms. This post explains Google Cloud Storage classes and helps to understand which  one to choose.

Join & Write a Comment

Suggested Solutions

JSON is being used more and more, besides XML, and you surely wanted to parse the data out into SQL instead of doing it in some Javascript. The below function in SQL Server can do the job for you, returning a quick table with the parsed data.
Ever needed a SQL 2008 Database replicated/mirrored/log shipped on another server but you can't take the downtime inflicted by initial snapshot or disconnect while T-logs are restored or mirror applied? You can use SQL Server Initialize from Backup…
This videos aims to give the viewer a basic demonstration of how a user can query current session information by using the SYS_CONTEXT function
This Micro Tutorial will teach you how to reformat your flash drive. Sometimes your flash drive may have issues carrying files so this will completely restore it to manufacturing settings. Make sure to backup all files before reformatting. This w…

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now