Solved

Very High Current Disk Queue Spikes

Posted on 2007-11-26
13
1,737 Views
Last Modified: 2013-11-14
We have a Managed SQL server 2000 cluster running against a SAN disk set. At occasional peroids we seem to suffer a very high number of SQL Deadlocks and performance issues. I decided to monitor the SQL server and during this peroid I was monitiring Avg Disk Queue Length, Current disk queue length., Disk bytes/sec, SQL transactions/sec and a few others. Now normally the Disk Q sits around .05 to.5 but at certain times I noticed the the current queue length would climb to around 850+ over 25 seconds with no significant increase in bytes/sec or trans/sec. Could this be a SAN config issue? What areas can I investigate considering this is a Managed server and only have restrcited remote access to the server. I can provide a screenshot of the spike. The support company say the stats are irrelvant and the issue is with the application and they can't see anything with the system.
0
Comment
Question by:dusty_nz
  • 6
  • 3
  • 2
  • +2
13 Comments
 
LVL 32

Assisted Solution

by:bhess1
bhess1 earned 40 total points
ID: 20354879
Can't say that this applies in this case, but I ran into a very similar symptom recently where some SQL DB files were in a compressed directory.  At least check to ensure that this is not the case for you.
0
 

Author Comment

by:dusty_nz
ID: 20354901
Heres a couple of pictures showing the spikes

http://kiwiland.net.nz/download/diskqSpike1.jpg
http://kiwiland.net.nz/download/diskqSpike2.jpg

The second clarely shows that the server is not really under any increased load.
0
 
LVL 12

Assisted Solution

by:kselvia
kselvia earned 20 total points
ID: 20355256
Assuming you have enough RAM most of your reads should come from cache.

I would focus on tempdb.  How many tempdb files do you have? Are they on different logical disks?

Do you do a lot of SELECT * into #temptable in your code?

That will put a schema lock on the current tempdb file. If you have only one tempdb file, it will block all other SELECT INTOs until it completes.  At that time, you may see the spikes.

It is best to have one tempdb file for each CPU.

It's also not a good idea to SELECT  INTO for large result sets.

To prevent schema locks on tempdb, either pre-create the temp table and INSERT into it, or populate the temp table in 2 steps.  
1) SELECT * into #temptable from Mytable where 1 = 2 (this will create an empty structure thus avoiding extended schema locks)  
2)  INSERT #temptable SELECT * from Mytable

Or run SQL profiler and capture the events that are occuring when you see these spikes.
0
Comprehensive Backup Solutions for Microsoft

Acronis protects the complete Microsoft technology stack: Windows Server, Windows PC, laptop and Surface data; Microsoft business applications; Microsoft Hyper-V; Azure VMs; Microsoft Windows Server 2016; Microsoft Exchange 2016 and SQL Server 2016.

 

Author Comment

by:dusty_nz
ID: 20355275
The database does do some big selects however it has 8 gig ram and max CPU gets to around 10%. all the other perf stats like Cache hits etc are good. Its these occasionaly spikes where it looks like the disk is unable top be written to. This seems to be primarily on the Data drive and the log drive is not impacted.
At other times we can put the system under a lot more load and its happy. My guess is its a SAN config issue??
0
 
LVL 32

Assisted Solution

by:bhess1
bhess1 earned 40 total points
ID: 20355321
Raid-5 data drive?  Raid-0, 1, or 10 log drive?

Is there anything that would do a mass Update of a large number of records?

I would not expect the CPU to spike on whatever is causing this, since the disk is orders of magnitude slower than CPU or RAM.

When I see this sort of pattern, I first suspect massive data changes that cause multitudes of records to be rewritten.  What might be causing that, I cannot say, but large updates of clustered indexes, causing mass splitting, might cause this on a large table on a RAID-5 drive set.
0
 

Author Comment

by:dusty_nz
ID: 20355326
However we are not seeing an increase in bytes/sec or in transactions/sec. Also we have had way bigger days and nothing like this.
0
 
LVL 55

Accepted Solution

by:
andyalder earned 45 total points
ID: 20357210
Is this SAN dedicated to you or shared with other servers? It looks that the SAN may be slowing down during these spiles but that doesn't mean there is anything wrong with it, just that other servers are hitting it heavily at that time. Is there any guaranteed performance on your LUNs from the hosting centre?
0
 
LVL 35

Assisted Solution

by:David Todd
David Todd earned 20 total points
ID: 20367913
Hi,

kselvia:
Another expert has suggested that your comment about select into #someTempTable no longer applies

I've openned a question to discuss this:
http://www.experts-exchange.com/Microsoft/Development/MS-SQL-Server/Q_22988582.html

Cheers
  David
0
 

Author Comment

by:dusty_nz
ID: 20368862
Thanks for everyones contribution. The issue was actually that we had SAN issues but wanted a 3rd point of view. The manahging company (Big WA one) had actually build the san that had a multi partitioned Lun. Two partitions containing the db and logs (V Bad) and the third partition containing the exchange server. Also they were running a Hitachi (HS9000?) which actually has issues with resource contention na dis known to lockup when pushed. We were actually seeing points for almost 25 seconds where the SAN was processing no I/Os at all. Causing transactions to stall incompleted and then raise the issue of deadlocks. Putting loggins we actually saw over 20 cases per day when the Disk Qhit over 6000. The managing company resfused to accept they may have a SAN issue.
0
 

Author Comment

by:dusty_nz
ID: 20368923
reposted for bad spelling. Sorry
Thanks for everyones contribution. The issue was actually that we had SAN issues but wanted a 3rd persons point of view. The managing company (Big WA one) had actually configured the SQL using a SAN partition sharing both Datana dlogs on the same raidset as well as a very big corporate exchange server. On the same raid set. Two partitions containing the db and logs (V Bad) and the third partition containing the exchange server. Also they were running a Hitachi (HS9000?) SAN which actually has issues with resource contention and is known to lockup when pushed. We were actually seeing periods of almost 25 seconds where the SAN had stopped processing any disk requests at all. This was causing transactions to stall incompleted and then raise the issue of deadlocks. Putting logging in we actually saw over 20 cases per day when the Disk Q hit over 6000. The managing company refused to accept they may have a SAN issue and claimed that table locks were causing disk locks.
0
 
LVL 55

Expert Comment

by:andyalder
ID: 20369237
So I was right that you had a disk I/O problem (it's obvious from the 4th spike on your second pic) but you gave me a B grade. I presume the first red spike on that one is you using the mouse to start perfmon. You can see disk writes/sec=0 even though the queue goes up. If it was a BMP I'd draw a circle around the evidence with mspaint and post it on ee upload site.
0
 

Author Comment

by:dusty_nz
ID: 20369304
Yeah. Sorry Andy. I felt it was a bit of a trick question and wasn't actually looking for an answer but more getting a broad point of view.

I do appreciate your answer and it was indeed correct however I gave points to everyone. (You got the most by far) Unfortunitly you can only give one grade for everyone.

0
 
LVL 55

Expert Comment

by:andyalder
ID: 20369822
It was a really good question and I don't mind about grades anyway, MS could add it to their list of how to interpret perfmon exams. Not really about SQL though.

https://filedb.experts-exchange.com/incoming/ee-stuff/5869-flatline.jpg
0

Featured Post

Efficient way to get backups off site to Azure

This user guide provides instructions on how to deploy and configure both a StoneFly Scale Out NAS Enterprise Cloud Drive virtual machine and Veeam Cloud Connect in the Microsoft Azure Cloud.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
SAS 12gb/s Drives on SAS 6gb/s Controller? 3 33
Help creating a spatial object in SQL Server 4 22
What's wrong with this T-SQL Foreign Key? 7 44
VB.net and sql server 4 35
This article explains how to reset the password of the sa account on a Microsoft SQL Server.  The steps in this article work in SQL 2005, 2008, 2008 R2, 2012, 2014 and 2016.
This article aims to explain the working of CircularLogArchiver. This tool was designed to solve the buildup of log file in cases where systems do not support circular logging or where circular logging is not enabled
This tutorial will walk an individual through the process of installing the necessary services and then configuring a Windows Server 2012 system as an iSCSI target. To install the necessary roles, go to Server Manager, and select Add Roles and Featu…
This Micro Tutorial will teach you how to reformat your flash drive. Sometimes your flash drive may have issues carrying files so this will completely restore it to manufacturing settings. Make sure to backup all files before reformatting. This w…

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question