?
Solved

CPU drop / Latency Spike on Windows servers causes disconnect

Posted on 2014-08-20
7
Medium Priority
?
483 Views
Last Modified: 2014-12-01
Hi Experts,

I have a long time problem but cannot find a solution, even after opening tickets with both VMware and Microsoft.

We first found this issue with DFS on a Win 2003 server, but now, we are finding it on an SQL server causing a disconnect.

Server environment includes boot from SAN and all of our disks are SAN based.  The file server has 4GB RAM and 1 processor.  The SQL server has 8GB RAM and 4 CPU.

Disk speed has been upgraded to flash as a test, but this hasn't solved the problem.

We also have a stretched cluster providing a mirrored environment through an IBM SVC implementation.

Logs have no indication of what is causing the spike in latency and the drop in CPU.  We have McAfee and Altiris installed, and our backup agent installed is Avamar.  To isolate, we have tried removing everything from these boxes.

Any thoughts in helping to identify this issue will be greatly greatly appreciated.
0
Comment
Question by:svillardi
7 Comments
 
LVL 14

Accepted Solution

by:
Brad Groux earned 2000 total points
ID: 40273799
Check the disk latency for the cluster, anything around 20ms response times or higher can cause major issues while 10-12ms is the desired rate for Windows clusters.

You can utilize Storport.sys to find out the I/O usage. Here's a TechNet blog post explaining the process.

http://blogs.technet.com/b/askcore/archive/2013/04/25/tracing-with-storport-in-windows-2012-and-windows-8-without-kb2819476-hotfix.aspx

Note, that the vast majority of CRITSITS regarding clusters revolve around network throughput or disk I/O. Many people outgrow their storage and network without realizing it.
0
 
LVL 10

Expert Comment

by:Korbus
ID: 40273809
Since you are using SANs for storage, have you confirmed your network isn't simply getting jammed up?  If you connect JUST the SANs and servers together, eliminating the rest of the network, do you continue to have latency issues?
0
 

Author Comment

by:svillardi
ID: 40273837
I cannot isolate my environment:  In a stretched cluster, all hosts see all storage.  So the IBM SAN Volume Controller is the middle man.  The zoning is properly set so that the hosts see only the SVC, as well as the storage only sees the SVC.    Hosts do not see storage and vice versa.  No way to keep redundancy between both of our sites across the stretched cluster without dismantling everything.


Brad - I don't know what CRITSITS is.

There has to be another way.
0
Restore individual SQL databases with ease

Veeam Explorer for Microsoft SQL Server delivers an easy-to-use, wizard-driven interface for restoring your databases from a backup. No expert SQL background required. Web interface provides a complete view of all available SQL databases to simplify the recovery of lost database

 

Author Comment

by:svillardi
ID: 40274131
VMware is showing the spike in disk.  The question is how do I find what's causing it?  

The Win 2008 box was at SAS speed and I moved it to flash.  I still get this huge spike in latency / drop in CPU that SQL loses it's connection and has to be reset.  This happens almost every day at about 8:03 in the morning.

The Win 2003 box is that DFS server.  When I moved it to SAS (from NLSAS) the random outages stopped taking about 2-25 mins and each outage was much shorter.
0
 

Author Comment

by:svillardi
ID: 40274242
CRITSIT -- Crisis Situation?
0
 

Author Comment

by:svillardi
ID: 40274458
Brad,

On my Win 2008 R2 box I tried to install KB978000 and it said "This update is not applicable to your computer."

I double checked and it's Windows 2008 R2, SP1.  For sure.

Thanks,

S.....
0
 
LVL 20

Expert Comment

by:compdigit44
ID: 40280993
We had a similar issue where I work with our IBM SVC and high latency on some servers. Long story short excess pause frames and the resolution was a firmware upgrade to the switch
0

Featured Post

NEW Veeam Agent for Microsoft Windows

Backup and recover physical and cloud-based servers and workstations, as well as endpoint devices that belong to remote users. Avoid downtime and data loss quickly and easily for Windows-based physical or public cloud-based workloads!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

It’s time for spooky stories and consuming way too much sugar, including the many treats we’ve whipped for you in the world of tech. Check it out!
How much do you know about the future of data centers? If you're like 50% of organizations, then it's probably not enough. Read on to get up to speed on this emerging field.
This tutorial will walk an individual through the process of installing the necessary services and then configuring a Windows Server 2012 system as an iSCSI target. To install the necessary roles, go to Server Manager, and select Add Roles and Featu…
Despite its rising prevalence in the business world, "the cloud" is still misunderstood. Some companies still believe common misconceptions about lack of security in cloud solutions and many misuses of cloud storage options still occur every day. …

840 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question