Solved

Windows Failover Cluster witness disk failing repeatedly

Posted on 2010-08-30
6
2,071 Views
Last Modified: 2012-08-13
I have a windows failover cluster running two 2008 R2 Servers on VMWare with backend storage on a Compellent SAN.  Starting a few days ago, the shared witness disk is in a continual failure loop.  It goes from online to offline tp pending in the space of 1 to 2 minutes and it does it continuously.  I have run the validation tests with the disks offline and have no errors.  The vmware servers are hosting other servers and none of those are having problems.  SAN shows no errors, network shows no errors, and the only events I see indicate that the witness disk failed, but don't give any other information.

The only thing that changed recently is that the windows updates which were released on 8/24/10 were installed.  I've done some searches but I am not finding any information.  Nothing has been done because I don't know what to try.

VMWare servers are HP DL 380s with fiber connection via QLogix to the SAN, which is a Compellent 20 Series running version 4.5.3.  Again, no other systems are showing errors.  Is it possible that the Witness disk could be corrupt?
0
Comment
Question by:AANKyle
  • 3
  • 3
6 Comments
 
LVL 2

Accepted Solution

by:
jesse_7271 earned 500 total points
ID: 33559972
So there is nothing in Application, System, or Cluster Logs besides the fail?
0
 
LVL 2

Assisted Solution

by:jesse_7271
jesse_7271 earned 500 total points
ID: 33559990
%systemroot%\Cluster\cluster.log
0
 

Author Comment

by:AANKyle
ID: 33560059
I do not have a cluster.log file.  The only log files in there are .log1 and .log2 and both are unreadable.

In the system log, I do have disk errors that I did not see before - Event ID is 15.  Description is "The device, \Device\Harddisk1\DR1, is not ready for access yet."  I also see another one that says I need to run chkdsk on the Q volume, which is the witness disk.  Gonna try that now.
0
Backup Your Microsoft Windows Server®

Backup all your Microsoft Windows Server – on-premises, in remote locations, in private and hybrid clouds. Your entire Windows Server will be backed up in one easy step with patented, block-level disk imaging. We achieve RTOs (recovery time objectives) as low as 15 seconds.

 

Author Comment

by:AANKyle
ID: 33560146
Check disk ran, but did not find any errors.  I did get this event in the log though:

Driver Management concluded the process to install driver FileRepository\volsnap.inf_amd64_neutral_7499a4fac85b39fc\volsnap.inf for Device Instance ID STORAGE\VOLUMESNAPSHOT\HARDDISKVOLUMESNAPSHOT1 with the following status: 0x0.
0
 
LVL 2

Assisted Solution

by:jesse_7271
jesse_7271 earned 500 total points
ID: 33560346
Have you tried pausing one of the nodes?

Too many factors to trouble shoot without narrowing down more.  I would make sure you are getting a high level of logging

http://blogs.msdn.com/b/clustering/archive/2008/09/24/8962934.aspx

0
 

Author Comment

by:AANKyle
ID: 33560918
After running chkdsk on Q everything seems to have stabilized.  No more errors or disk messages.  Node 2 was paused so I am going to bring it back online and see what happens.

Thanks for the suggestion to recheck the event logs.  That helped me find the problem.
0

Featured Post

U.S. Department of Agriculture and Acronis Access

With the new era of mobile computing, smartphones and tablets, wireless communications and cloud services, the USDA sought to take advantage of a mobilized workforce and the blurring lines between personal and corporate computing resources.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Scenario:  You do full backups to a internal hard drive in either product (SBS or Server 2008).  All goes well for a very long time.  One day, backups begin to fail with a message that the disk is full.  Your disk contains many, many more backups th…
The recent Microsoft changes on update philosophy for Windows pre-10 and their impact on existing WSUS implementations.
This tutorial will show how to configure a new Backup Exec 2012 server and move an existing database to that server with the use of the BEUtility. Install Backup Exec 2012 on the new server and apply all of the latest hotfixes and service packs. The…
This tutorial will walk an individual through the process of transferring the five major, necessary Active Directory Roles, commonly referred to as the FSMO roles to another domain controller. Log onto the new domain controller with a user account t…

862 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

25 Experts available now in Live!

Get 1:1 Help Now