Solved

Windows Failover Cluster witness disk failing repeatedly

Posted on 2010-08-30
6
2,085 Views
Last Modified: 2012-08-13
I have a windows failover cluster running two 2008 R2 Servers on VMWare with backend storage on a Compellent SAN.  Starting a few days ago, the shared witness disk is in a continual failure loop.  It goes from online to offline tp pending in the space of 1 to 2 minutes and it does it continuously.  I have run the validation tests with the disks offline and have no errors.  The vmware servers are hosting other servers and none of those are having problems.  SAN shows no errors, network shows no errors, and the only events I see indicate that the witness disk failed, but don't give any other information.

The only thing that changed recently is that the windows updates which were released on 8/24/10 were installed.  I've done some searches but I am not finding any information.  Nothing has been done because I don't know what to try.

VMWare servers are HP DL 380s with fiber connection via QLogix to the SAN, which is a Compellent 20 Series running version 4.5.3.  Again, no other systems are showing errors.  Is it possible that the Witness disk could be corrupt?
0
Comment
Question by:AANKyle
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
6 Comments
 
LVL 2

Accepted Solution

by:
jesse_7271 earned 500 total points
ID: 33559972
So there is nothing in Application, System, or Cluster Logs besides the fail?
0
 
LVL 2

Assisted Solution

by:jesse_7271
jesse_7271 earned 500 total points
ID: 33559990
%systemroot%\Cluster\cluster.log
0
 

Author Comment

by:AANKyle
ID: 33560059
I do not have a cluster.log file.  The only log files in there are .log1 and .log2 and both are unreadable.

In the system log, I do have disk errors that I did not see before - Event ID is 15.  Description is "The device, \Device\Harddisk1\DR1, is not ready for access yet."  I also see another one that says I need to run chkdsk on the Q volume, which is the witness disk.  Gonna try that now.
0
How our DevOps Teams Maximize Uptime

Our Dev teams are like yours. They’re continually cranking out code for new features/bugs fixes, testing, deploying, responding to production monitoring events and more. It’s complex. So, we thought you’d like to see what’s working for us. Read the use case whitepaper.

 

Author Comment

by:AANKyle
ID: 33560146
Check disk ran, but did not find any errors.  I did get this event in the log though:

Driver Management concluded the process to install driver FileRepository\volsnap.inf_amd64_neutral_7499a4fac85b39fc\volsnap.inf for Device Instance ID STORAGE\VOLUMESNAPSHOT\HARDDISKVOLUMESNAPSHOT1 with the following status: 0x0.
0
 
LVL 2

Assisted Solution

by:jesse_7271
jesse_7271 earned 500 total points
ID: 33560346
Have you tried pausing one of the nodes?

Too many factors to trouble shoot without narrowing down more.  I would make sure you are getting a high level of logging

http://blogs.msdn.com/b/clustering/archive/2008/09/24/8962934.aspx

0
 

Author Comment

by:AANKyle
ID: 33560918
After running chkdsk on Q everything seems to have stabilized.  No more errors or disk messages.  Node 2 was paused so I am going to bring it back online and see what happens.

Thanks for the suggestion to recheck the event logs.  That helped me find the problem.
0

Featured Post

Salesforce Made Easy to Use

On-screen guidance at the moment of need enables you & your employees to focus on the core, you can now boost your adoption rates swiftly and simply with one easy tool.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Restoring deleted objects in Active Directory has been a standard feature in Active Directory for many years, yet some admins may not know what is available.
A procedure for exporting installed hotfix details of remote computers using powershell
This tutorial will walk an individual through locating and launching the BEUtility application and how to execute it on the appropriate database. Log onto the server running the Backup Exec database. In a larger environment, this would generally be …
This tutorial will walk an individual through the steps necessary to enable the VMware\Hyper-V licensed feature of Backup Exec 2012. In addition, how to add a VMware server and configure a backup job. The first step is to acquire the necessary licen…

730 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question