• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1778
  • Last Modified:

Very High Current Disk Queue Spikes

We have a Managed SQL server 2000 cluster running against a SAN disk set. At occasional peroids we seem to suffer a very high number of SQL Deadlocks and performance issues. I decided to monitor the SQL server and during this peroid I was monitiring Avg Disk Queue Length, Current disk queue length., Disk bytes/sec, SQL transactions/sec and a few others. Now normally the Disk Q sits around .05 to.5 but at certain times I noticed the the current queue length would climb to around 850+ over 25 seconds with no significant increase in bytes/sec or trans/sec. Could this be a SAN config issue? What areas can I investigate considering this is a Managed server and only have restrcited remote access to the server. I can provide a screenshot of the spike. The support company say the stats are irrelvant and the issue is with the application and they can't see anything with the system.
0
dusty_nz
Asked:
dusty_nz
  • 6
  • 3
  • 2
  • +2
5 Solutions
 
Brendt HessSenior DBACommented:
Can't say that this applies in this case, but I ran into a very similar symptom recently where some SQL DB files were in a compressed directory.  At least check to ensure that this is not the case for you.
0
 
dusty_nzAuthor Commented:
Heres a couple of pictures showing the spikes

http://kiwiland.net.nz/download/diskqSpike1.jpg
http://kiwiland.net.nz/download/diskqSpike2.jpg

The second clarely shows that the server is not really under any increased load.
0
 
Ken SelviaRetiredCommented:
Assuming you have enough RAM most of your reads should come from cache.

I would focus on tempdb.  How many tempdb files do you have? Are they on different logical disks?

Do you do a lot of SELECT * into #temptable in your code?

That will put a schema lock on the current tempdb file. If you have only one tempdb file, it will block all other SELECT INTOs until it completes.  At that time, you may see the spikes.

It is best to have one tempdb file for each CPU.

It's also not a good idea to SELECT  INTO for large result sets.

To prevent schema locks on tempdb, either pre-create the temp table and INSERT into it, or populate the temp table in 2 steps.  
1) SELECT * into #temptable from Mytable where 1 = 2 (this will create an empty structure thus avoiding extended schema locks)  
2)  INSERT #temptable SELECT * from Mytable

Or run SQL profiler and capture the events that are occuring when you see these spikes.
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
dusty_nzAuthor Commented:
The database does do some big selects however it has 8 gig ram and max CPU gets to around 10%. all the other perf stats like Cache hits etc are good. Its these occasionaly spikes where it looks like the disk is unable top be written to. This seems to be primarily on the Data drive and the log drive is not impacted.
At other times we can put the system under a lot more load and its happy. My guess is its a SAN config issue??
0
 
Brendt HessSenior DBACommented:
Raid-5 data drive?  Raid-0, 1, or 10 log drive?

Is there anything that would do a mass Update of a large number of records?

I would not expect the CPU to spike on whatever is causing this, since the disk is orders of magnitude slower than CPU or RAM.

When I see this sort of pattern, I first suspect massive data changes that cause multitudes of records to be rewritten.  What might be causing that, I cannot say, but large updates of clustered indexes, causing mass splitting, might cause this on a large table on a RAID-5 drive set.
0
 
dusty_nzAuthor Commented:
However we are not seeing an increase in bytes/sec or in transactions/sec. Also we have had way bigger days and nothing like this.
0
 
andyalderSaggar makers bottom knockerCommented:
Is this SAN dedicated to you or shared with other servers? It looks that the SAN may be slowing down during these spiles but that doesn't mean there is anything wrong with it, just that other servers are hitting it heavily at that time. Is there any guaranteed performance on your LUNs from the hosting centre?
0
 
David ToddSenior DBACommented:
Hi,

kselvia:
Another expert has suggested that your comment about select into #someTempTable no longer applies

I've openned a question to discuss this:
http://www.experts-exchange.com/Microsoft/Development/MS-SQL-Server/Q_22988582.html

Cheers
  David
0
 
dusty_nzAuthor Commented:
Thanks for everyones contribution. The issue was actually that we had SAN issues but wanted a 3rd point of view. The manahging company (Big WA one) had actually build the san that had a multi partitioned Lun. Two partitions containing the db and logs (V Bad) and the third partition containing the exchange server. Also they were running a Hitachi (HS9000?) which actually has issues with resource contention na dis known to lockup when pushed. We were actually seeing points for almost 25 seconds where the SAN was processing no I/Os at all. Causing transactions to stall incompleted and then raise the issue of deadlocks. Putting loggins we actually saw over 20 cases per day when the Disk Qhit over 6000. The managing company resfused to accept they may have a SAN issue.
0
 
dusty_nzAuthor Commented:
reposted for bad spelling. Sorry
Thanks for everyones contribution. The issue was actually that we had SAN issues but wanted a 3rd persons point of view. The managing company (Big WA one) had actually configured the SQL using a SAN partition sharing both Datana dlogs on the same raidset as well as a very big corporate exchange server. On the same raid set. Two partitions containing the db and logs (V Bad) and the third partition containing the exchange server. Also they were running a Hitachi (HS9000?) SAN which actually has issues with resource contention and is known to lockup when pushed. We were actually seeing periods of almost 25 seconds where the SAN had stopped processing any disk requests at all. This was causing transactions to stall incompleted and then raise the issue of deadlocks. Putting logging in we actually saw over 20 cases per day when the Disk Q hit over 6000. The managing company refused to accept they may have a SAN issue and claimed that table locks were causing disk locks.
0
 
andyalderSaggar makers bottom knockerCommented:
So I was right that you had a disk I/O problem (it's obvious from the 4th spike on your second pic) but you gave me a B grade. I presume the first red spike on that one is you using the mouse to start perfmon. You can see disk writes/sec=0 even though the queue goes up. If it was a BMP I'd draw a circle around the evidence with mspaint and post it on ee upload site.
0
 
dusty_nzAuthor Commented:
Yeah. Sorry Andy. I felt it was a bit of a trick question and wasn't actually looking for an answer but more getting a broad point of view.

I do appreciate your answer and it was indeed correct however I gave points to everyone. (You got the most by far) Unfortunitly you can only give one grade for everyone.

0
 
andyalderSaggar makers bottom knockerCommented:
It was a really good question and I don't mind about grades anyway, MS could add it to their list of how to interpret perfmon exams. Not really about SQL though.

https://filedb.experts-exchange.com/incoming/ee-stuff/5869-flatline.jpg
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

  • 6
  • 3
  • 2
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now