Solved

Windows Failover Cluster witness disk failing repeatedly

Posted on 2010-08-30
6
2,076 Views
Last Modified: 2012-08-13
I have a windows failover cluster running two 2008 R2 Servers on VMWare with backend storage on a Compellent SAN.  Starting a few days ago, the shared witness disk is in a continual failure loop.  It goes from online to offline tp pending in the space of 1 to 2 minutes and it does it continuously.  I have run the validation tests with the disks offline and have no errors.  The vmware servers are hosting other servers and none of those are having problems.  SAN shows no errors, network shows no errors, and the only events I see indicate that the witness disk failed, but don't give any other information.

The only thing that changed recently is that the windows updates which were released on 8/24/10 were installed.  I've done some searches but I am not finding any information.  Nothing has been done because I don't know what to try.

VMWare servers are HP DL 380s with fiber connection via QLogix to the SAN, which is a Compellent 20 Series running version 4.5.3.  Again, no other systems are showing errors.  Is it possible that the Witness disk could be corrupt?
0
Comment
Question by:AANKyle
  • 3
  • 3
6 Comments
 
LVL 2

Accepted Solution

by:
jesse_7271 earned 500 total points
ID: 33559972
So there is nothing in Application, System, or Cluster Logs besides the fail?
0
 
LVL 2

Assisted Solution

by:jesse_7271
jesse_7271 earned 500 total points
ID: 33559990
%systemroot%\Cluster\cluster.log
0
 

Author Comment

by:AANKyle
ID: 33560059
I do not have a cluster.log file.  The only log files in there are .log1 and .log2 and both are unreadable.

In the system log, I do have disk errors that I did not see before - Event ID is 15.  Description is "The device, \Device\Harddisk1\DR1, is not ready for access yet."  I also see another one that says I need to run chkdsk on the Q volume, which is the witness disk.  Gonna try that now.
0
Netscaler Common Configuration How To guides

If you use NetScaler you will want to see these guides. The NetScaler How To Guides show administrators how to get NetScaler up and configured by providing instructions for common scenarios and some not so common ones.

 

Author Comment

by:AANKyle
ID: 33560146
Check disk ran, but did not find any errors.  I did get this event in the log though:

Driver Management concluded the process to install driver FileRepository\volsnap.inf_amd64_neutral_7499a4fac85b39fc\volsnap.inf for Device Instance ID STORAGE\VOLUMESNAPSHOT\HARDDISKVOLUMESNAPSHOT1 with the following status: 0x0.
0
 
LVL 2

Assisted Solution

by:jesse_7271
jesse_7271 earned 500 total points
ID: 33560346
Have you tried pausing one of the nodes?

Too many factors to trouble shoot without narrowing down more.  I would make sure you are getting a high level of logging

http://blogs.msdn.com/b/clustering/archive/2008/09/24/8962934.aspx

0
 

Author Comment

by:AANKyle
ID: 33560918
After running chkdsk on Q everything seems to have stabilized.  No more errors or disk messages.  Node 2 was paused so I am going to bring it back online and see what happens.

Thanks for the suggestion to recheck the event logs.  That helped me find the problem.
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

A procedure for exporting installed hotfix details of remote computers using powershell
This article explains how to install and use the NTBackup utility that comes with Windows Server.
This tutorial will walk an individual through configuring a drive on a Windows Server 2008 to perform shadow copies in order to quickly recover deleted files and folders. Click on Start and then select Computer to view the available drives on the se…
This tutorial will walk an individual through setting the global and backup job media overwrite and protection periods in Backup Exec 2012. Log onto the Backup Exec Central Administration Server. Examine the services. If all or most of them are stop…

777 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question