Very High Current Disk Queue Spikes

We have a Managed SQL server 2000 cluster running against a SAN disk set. At occasional peroids we seem to suffer a very high number of SQL Deadlocks and performance issues. I decided to monitor the SQL server and during this peroid I was monitiring Avg Disk Queue Length, Current disk queue length., Disk bytes/sec, SQL transactions/sec and a few others. Now normally the Disk Q sits around .05 to.5 but at certain times I noticed the the current queue length would climb to around 850+ over 25 seconds with no significant increase in bytes/sec or trans/sec. Could this be a SAN config issue? What areas can I investigate considering this is a Managed server and only have restrcited remote access to the server. I can provide a screenshot of the spike. The support company say the stats are irrelvant and the issue is with the application and they can't see anything with the system.
dusty_nzAsked:
Who is Participating?

Improve company productivity with a Business Account.Sign Up

x
 
andyalderConnect With a Mentor Commented:
Is this SAN dedicated to you or shared with other servers? It looks that the SAN may be slowing down during these spiles but that doesn't mean there is anything wrong with it, just that other servers are hitting it heavily at that time. Is there any guaranteed performance on your LUNs from the hosting centre?
0
 
Brendt HessConnect With a Mentor Senior DBACommented:
Can't say that this applies in this case, but I ran into a very similar symptom recently where some SQL DB files were in a compressed directory.  At least check to ensure that this is not the case for you.
0
 
dusty_nzAuthor Commented:
Heres a couple of pictures showing the spikes

http://kiwiland.net.nz/download/diskqSpike1.jpg
http://kiwiland.net.nz/download/diskqSpike2.jpg

The second clarely shows that the server is not really under any increased load.
0
What Kind of Coding Program is Right for You?

There are many ways to learn to code these days. From coding bootcamps like Flatiron School to online courses to totally free beginner resources. The best way to learn to code depends on many factors, but the most important one is you. See what course is best for you.

 
Ken SelviaConnect With a Mentor RetiredCommented:
Assuming you have enough RAM most of your reads should come from cache.

I would focus on tempdb.  How many tempdb files do you have? Are they on different logical disks?

Do you do a lot of SELECT * into #temptable in your code?

That will put a schema lock on the current tempdb file. If you have only one tempdb file, it will block all other SELECT INTOs until it completes.  At that time, you may see the spikes.

It is best to have one tempdb file for each CPU.

It's also not a good idea to SELECT  INTO for large result sets.

To prevent schema locks on tempdb, either pre-create the temp table and INSERT into it, or populate the temp table in 2 steps.  
1) SELECT * into #temptable from Mytable where 1 = 2 (this will create an empty structure thus avoiding extended schema locks)  
2)  INSERT #temptable SELECT * from Mytable

Or run SQL profiler and capture the events that are occuring when you see these spikes.
0
 
dusty_nzAuthor Commented:
The database does do some big selects however it has 8 gig ram and max CPU gets to around 10%. all the other perf stats like Cache hits etc are good. Its these occasionaly spikes where it looks like the disk is unable top be written to. This seems to be primarily on the Data drive and the log drive is not impacted.
At other times we can put the system under a lot more load and its happy. My guess is its a SAN config issue??
0
 
Brendt HessConnect With a Mentor Senior DBACommented:
Raid-5 data drive?  Raid-0, 1, or 10 log drive?

Is there anything that would do a mass Update of a large number of records?

I would not expect the CPU to spike on whatever is causing this, since the disk is orders of magnitude slower than CPU or RAM.

When I see this sort of pattern, I first suspect massive data changes that cause multitudes of records to be rewritten.  What might be causing that, I cannot say, but large updates of clustered indexes, causing mass splitting, might cause this on a large table on a RAID-5 drive set.
0
 
dusty_nzAuthor Commented:
However we are not seeing an increase in bytes/sec or in transactions/sec. Also we have had way bigger days and nothing like this.
0
 
David ToddConnect With a Mentor Senior DBACommented:
Hi,

kselvia:
Another expert has suggested that your comment about select into #someTempTable no longer applies

I've openned a question to discuss this:
http://www.experts-exchange.com/Microsoft/Development/MS-SQL-Server/Q_22988582.html

Cheers
  David
0
 
dusty_nzAuthor Commented:
Thanks for everyones contribution. The issue was actually that we had SAN issues but wanted a 3rd point of view. The manahging company (Big WA one) had actually build the san that had a multi partitioned Lun. Two partitions containing the db and logs (V Bad) and the third partition containing the exchange server. Also they were running a Hitachi (HS9000?) which actually has issues with resource contention na dis known to lockup when pushed. We were actually seeing points for almost 25 seconds where the SAN was processing no I/Os at all. Causing transactions to stall incompleted and then raise the issue of deadlocks. Putting loggins we actually saw over 20 cases per day when the Disk Qhit over 6000. The managing company resfused to accept they may have a SAN issue.
0
 
dusty_nzAuthor Commented:
reposted for bad spelling. Sorry
Thanks for everyones contribution. The issue was actually that we had SAN issues but wanted a 3rd persons point of view. The managing company (Big WA one) had actually configured the SQL using a SAN partition sharing both Datana dlogs on the same raidset as well as a very big corporate exchange server. On the same raid set. Two partitions containing the db and logs (V Bad) and the third partition containing the exchange server. Also they were running a Hitachi (HS9000?) SAN which actually has issues with resource contention and is known to lockup when pushed. We were actually seeing periods of almost 25 seconds where the SAN had stopped processing any disk requests at all. This was causing transactions to stall incompleted and then raise the issue of deadlocks. Putting logging in we actually saw over 20 cases per day when the Disk Q hit over 6000. The managing company refused to accept they may have a SAN issue and claimed that table locks were causing disk locks.
0
 
andyalderCommented:
So I was right that you had a disk I/O problem (it's obvious from the 4th spike on your second pic) but you gave me a B grade. I presume the first red spike on that one is you using the mouse to start perfmon. You can see disk writes/sec=0 even though the queue goes up. If it was a BMP I'd draw a circle around the evidence with mspaint and post it on ee upload site.
0
 
dusty_nzAuthor Commented:
Yeah. Sorry Andy. I felt it was a bit of a trick question and wasn't actually looking for an answer but more getting a broad point of view.

I do appreciate your answer and it was indeed correct however I gave points to everyone. (You got the most by far) Unfortunitly you can only give one grade for everyone.

0
 
andyalderCommented:
It was a really good question and I don't mind about grades anyway, MS could add it to their list of how to interpret perfmon exams. Not really about SQL though.

https://filedb.experts-exchange.com/incoming/ee-stuff/5869-flatline.jpg
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.