CPU drop / Latency Spike on Windows servers causes disconnect

Posted on 2014-08-20
Last Modified: 2014-12-01
Hi Experts,

I have a long time problem but cannot find a solution, even after opening tickets with both VMware and Microsoft.

We first found this issue with DFS on a Win 2003 server, but now, we are finding it on an SQL server causing a disconnect.

Server environment includes boot from SAN and all of our disks are SAN based.  The file server has 4GB RAM and 1 processor.  The SQL server has 8GB RAM and 4 CPU.

Disk speed has been upgraded to flash as a test, but this hasn't solved the problem.

We also have a stretched cluster providing a mirrored environment through an IBM SVC implementation.

Logs have no indication of what is causing the spike in latency and the drop in CPU.  We have McAfee and Altiris installed, and our backup agent installed is Avamar.  To isolate, we have tried removing everything from these boxes.

Any thoughts in helping to identify this issue will be greatly greatly appreciated.
Question by:svillardi
    LVL 14

    Accepted Solution

    Check the disk latency for the cluster, anything around 20ms response times or higher can cause major issues while 10-12ms is the desired rate for Windows clusters.

    You can utilize Storport.sys to find out the I/O usage. Here's a TechNet blog post explaining the process.

    Note, that the vast majority of CRITSITS regarding clusters revolve around network throughput or disk I/O. Many people outgrow their storage and network without realizing it.
    LVL 10

    Expert Comment

    Since you are using SANs for storage, have you confirmed your network isn't simply getting jammed up?  If you connect JUST the SANs and servers together, eliminating the rest of the network, do you continue to have latency issues?

    Author Comment

    I cannot isolate my environment:  In a stretched cluster, all hosts see all storage.  So the IBM SAN Volume Controller is the middle man.  The zoning is properly set so that the hosts see only the SVC, as well as the storage only sees the SVC.    Hosts do not see storage and vice versa.  No way to keep redundancy between both of our sites across the stretched cluster without dismantling everything.

    Brad - I don't know what CRITSITS is.

    There has to be another way.

    Author Comment

    VMware is showing the spike in disk.  The question is how do I find what's causing it?  

    The Win 2008 box was at SAS speed and I moved it to flash.  I still get this huge spike in latency / drop in CPU that SQL loses it's connection and has to be reset.  This happens almost every day at about 8:03 in the morning.

    The Win 2003 box is that DFS server.  When I moved it to SAS (from NLSAS) the random outages stopped taking about 2-25 mins and each outage was much shorter.

    Author Comment

    CRITSIT -- Crisis Situation?

    Author Comment


    On my Win 2008 R2 box I tried to install KB978000 and it said "This update is not applicable to your computer."

    I double checked and it's Windows 2008 R2, SP1.  For sure.


    LVL 19

    Expert Comment

    We had a similar issue where I work with our IBM SVC and high latency on some servers. Long story short excess pause frames and the resolution was a firmware upgrade to the switch

    Featured Post

    6 Surprising Benefits of Threat Intelligence

    All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

    Join & Write a Comment

    AWS Glacier is Amazons cheapest storage option and is their answer to a ‘Cold’ storage service.  Customers primarily use this service for archival purposes and storage of infrastructure backups.  Its unlimited storage potential and low storage cost …
    This article is an update and follow-up of my previous article:   Storage 101: common concepts in the IT enterprise storage This time, I expand on more frequently used storage concepts.
    Teach the user how to configure vSphere clusters to support the VMware FT feature Open vSphere Web Client: Verify vSphere HA is enabled: Verify netowrking for vMotion and FT Logging is in place or create it: Turn On FT for a virtual machine: Verify …
    With the advent of Windows 10, Microsoft is pushing a Get Windows 10 icon into the notification area (system tray) of qualifying computers. There are many reasons for wanting to remove this icon. This two-part Experts Exchange video Micro Tutorial s…

    734 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    22 Experts available now in Live!

    Get 1:1 Help Now