Link to home
Start Free TrialLog in
Avatar of svillardi
svillardi

asked on

CPU drop / Latency Spike on Windows servers causes disconnect

Hi Experts,

I have a long time problem but cannot find a solution, even after opening tickets with both VMware and Microsoft.

We first found this issue with DFS on a Win 2003 server, but now, we are finding it on an SQL server causing a disconnect.

Server environment includes boot from SAN and all of our disks are SAN based.  The file server has 4GB RAM and 1 processor.  The SQL server has 8GB RAM and 4 CPU.

Disk speed has been upgraded to flash as a test, but this hasn't solved the problem.

We also have a stretched cluster providing a mirrored environment through an IBM SVC implementation.

Logs have no indication of what is causing the spike in latency and the drop in CPU.  We have McAfee and Altiris installed, and our backup agent installed is Avamar.  To isolate, we have tried removing everything from these boxes.

Any thoughts in helping to identify this issue will be greatly greatly appreciated.
ASKER CERTIFIED SOLUTION
Avatar of Brad Groux
Brad Groux
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Korbus
Korbus

Since you are using SANs for storage, have you confirmed your network isn't simply getting jammed up?  If you connect JUST the SANs and servers together, eliminating the rest of the network, do you continue to have latency issues?
Avatar of svillardi

ASKER

I cannot isolate my environment:  In a stretched cluster, all hosts see all storage.  So the IBM SAN Volume Controller is the middle man.  The zoning is properly set so that the hosts see only the SVC, as well as the storage only sees the SVC.    Hosts do not see storage and vice versa.  No way to keep redundancy between both of our sites across the stretched cluster without dismantling everything.


Brad - I don't know what CRITSITS is.

There has to be another way.
VMware is showing the spike in disk.  The question is how do I find what's causing it?  

The Win 2008 box was at SAS speed and I moved it to flash.  I still get this huge spike in latency / drop in CPU that SQL loses it's connection and has to be reset.  This happens almost every day at about 8:03 in the morning.

The Win 2003 box is that DFS server.  When I moved it to SAS (from NLSAS) the random outages stopped taking about 2-25 mins and each outage was much shorter.
CRITSIT -- Crisis Situation?
Brad,

On my Win 2008 R2 box I tried to install KB978000 and it said "This update is not applicable to your computer."

I double checked and it's Windows 2008 R2, SP1.  For sure.

Thanks,

S.....
We had a similar issue where I work with our IBM SVC and high latency on some servers. Long story short excess pause frames and the resolution was a firmware upgrade to the switch