Solved

Windows Failover Cluster witness disk failing repeatedly

Posted on 2010-08-30
6
2,090 Views
Last Modified: 2012-08-13
I have a windows failover cluster running two 2008 R2 Servers on VMWare with backend storage on a Compellent SAN.  Starting a few days ago, the shared witness disk is in a continual failure loop.  It goes from online to offline tp pending in the space of 1 to 2 minutes and it does it continuously.  I have run the validation tests with the disks offline and have no errors.  The vmware servers are hosting other servers and none of those are having problems.  SAN shows no errors, network shows no errors, and the only events I see indicate that the witness disk failed, but don't give any other information.

The only thing that changed recently is that the windows updates which were released on 8/24/10 were installed.  I've done some searches but I am not finding any information.  Nothing has been done because I don't know what to try.

VMWare servers are HP DL 380s with fiber connection via QLogix to the SAN, which is a Compellent 20 Series running version 4.5.3.  Again, no other systems are showing errors.  Is it possible that the Witness disk could be corrupt?
0
Comment
Question by:AANKyle
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
6 Comments
 
LVL 2

Accepted Solution

by:
jesse_7271 earned 500 total points
ID: 33559972
So there is nothing in Application, System, or Cluster Logs besides the fail?
0
 
LVL 2

Assisted Solution

by:jesse_7271
jesse_7271 earned 500 total points
ID: 33559990
%systemroot%\Cluster\cluster.log
0
 

Author Comment

by:AANKyle
ID: 33560059
I do not have a cluster.log file.  The only log files in there are .log1 and .log2 and both are unreadable.

In the system log, I do have disk errors that I did not see before - Event ID is 15.  Description is "The device, \Device\Harddisk1\DR1, is not ready for access yet."  I also see another one that says I need to run chkdsk on the Q volume, which is the witness disk.  Gonna try that now.
0
Business Impact of IT Communications

What are the business impacts of how well businesses communicate during an IT incident? Targeting, speed, and transparency all matter. Find out more in this infographic.

 

Author Comment

by:AANKyle
ID: 33560146
Check disk ran, but did not find any errors.  I did get this event in the log though:

Driver Management concluded the process to install driver FileRepository\volsnap.inf_amd64_neutral_7499a4fac85b39fc\volsnap.inf for Device Instance ID STORAGE\VOLUMESNAPSHOT\HARDDISKVOLUMESNAPSHOT1 with the following status: 0x0.
0
 
LVL 2

Assisted Solution

by:jesse_7271
jesse_7271 earned 500 total points
ID: 33560346
Have you tried pausing one of the nodes?

Too many factors to trouble shoot without narrowing down more.  I would make sure you are getting a high level of logging

http://blogs.msdn.com/b/clustering/archive/2008/09/24/8962934.aspx

0
 

Author Comment

by:AANKyle
ID: 33560918
After running chkdsk on Q everything seems to have stabilized.  No more errors or disk messages.  Node 2 was paused so I am going to bring it back online and see what happens.

Thanks for the suggestion to recheck the event logs.  That helped me find the problem.
0

Featured Post

Optimizing Cloud Backup for Low Bandwidth

With cloud storage prices going down a growing number of SMBs start to use it for backup storage. Unfortunately, business data volume rarely fits the average Internet speed. This article provides an overview of main Internet speed challenges and reveals backup best practices.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I was supporting a handful of Windows 2008 (non-R2) 2 node clusters with shared quorum disks. Some had SQL 2008 installed and some were just a vendor application that we supported. For the purposes of this article it doesn’t really matter which so w…
New Windows 7 Installations take days for Windows-Updates to show up and install. This can easily be fixed. I have finally decided to write an article because this seems to get asked several times a day lately. This Article and the Links apply to…
This tutorial will walk an individual through the steps necessary to configure their installation of BackupExec 2012 to use network shared disk space. Verify that the path to the shared storage is valid and that data can be written to that location:…
To efficiently enable the rotation of USB drives for backups, storage pools need to be created. This way no matter which USB drive is installed, the backups will successfully write without any administrative intervention. Multiple USB devices need t…

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question