Solved

2 Node Windows 2012 Cluster issues with Quorum

Posted on 2014-04-08
8
6,871 Views
Last Modified: 2014-05-09
Hello,

We have a 2 Node cluster (N1 & N2) running Hyper-V and VMs along with CSVs and running as "Node and Disk Majority (Quorum)" Quorum configuration.  

In order to perform maintenance on N1 we live migrated all VMs across to N2 and all was working fine.  After this shutting down N1 destroyed the cluster on N2 and we had no running VMs and the cluster was unavailable to attach to.  The only to get things working again was to quickly restart N1.

After connecting to the cluster the following error message was observed on N2,

"The Cluster service is shutting down because quorum was lost. This could be due to the loss of network connectivity between some or all nodes in the cluster, or a failover of the witness disk."

The owner of the CSV disk resources and quorum was N1 which could explain the issue.

Questions from here

1) If a shutdown initiates on N1 why does it only migrate VMs automatically and not cluster resources that are required?  Reading on I think we should "pause" the node so that it enters maintenance mode.

However the big (more worryingly) question is the following

2) In a 2 node cluster if the owner that the looks after the disk witness fails then the entire cluster will fail and provides no resilience whatsoever.  Is this correct?

If this is true my 2-node cluster only has a 50/50 chance of remaining working after a failure of any one of the nodes!

Thanks
0
Comment
Question by:nmxsupport
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 4
8 Comments
 
LVL 37

Expert Comment

by:Mahesh
ID: 39985826
Cluster requires 51% votes in order to remain alive

With TWO node cluster and single Quorum disk both members and quorum give one vote each to cluster
So in case you loss quorum disk still your cluster will remain alive because of TWO votes from two cluster nodes
OR
if single node get down still cluster has 1 vote from quorum and one vote from another node

Now you need to check all cluster resources properties and check if both servers are selected as a possible owner, otherwise once server hosting resource fails, resource will not get up on another node
Also check dependencies of resources and ensure that on Quorum disk should not have any dependencies

During cluster node maintenance, When you live migrate VMs from one node to another also move quorum disk resource manually to another server (Ensure that all resource owner is another server in advance to avoid any unfortunate problems)

lastly I hope you have TWO network cards per server, 1 for heart beat and one for cluster
If you have only one NIC per cluster node, then there are chances of failure are more when you reboot any cluster node.

Mahesh.
0
 

Author Comment

by:nmxsupport
ID: 39985869
Thank you Mahesh.

I checked and the Quorum disk has no dependencies and the Advanced Policies tab shows both N1 and N2 as possible owners.

In the Policies tab I have the following set,
* If resource fails, attempt restart on current node
Period for restarts = 15:00
Maximum restarts in the specified period = 1
Delay between restarts = 0.5
* If restart is unsuccesful, fail over all resources in this role
* If all the restart attempts fail, begin restarting again afte the specifed period = 01:00

There is no way to access any properties of the CSV volumes.

The critical error "The Cluster service is shutting down because quorum was lost. This could be due to the loss of network connectivity between some or all nodes in the cluster, or a failover of the witness disk." seems to indicate if the witness disk fails over, then the quorum is automatically lost and the cluster is shut down.
0
 
LVL 37

Expert Comment

by:Mahesh
ID: 39985902
Clustered Shared Volumes allows nodes to share access to storage, which means that the applications on that piece of storage can run on any node, or on different nodes, at any time.  CSV breaks the dependency between application resources (the VMs) and disk resources (for CSV disks) so that in a CSV environment it does not matter where the disk is mounted because it will appear local to all nodes in the cluster.  CSV manages storage access differently than regular clustered disks

You can not add or change the Possible Owner of a CSV from the GUI. You need to use cluster.exe command line for that.
http://virtuallyaware.wordpress.com/2011/11/28/blog-highlight-add-possible-owner-to-a-cluster-shared-volume/

How many network cards do you have per server \ cluster node

I hope you have TWO network cards per server, 1 for heart beat and one for cluster
If you have only one NIC per cluster node, then there are chances of failure are more when you reboot any cluster node.

Mahesh.
0
Free eBook: Backup on AWS

Everything you need to know about backup and disaster recovery with AWS, for FREE!

 

Author Comment

by:nmxsupport
ID: 39986010
Hello each server has 4 NICs, 2 teamed for LAN and 2 teamed for cluster/management.  Switches are cross linked and provide additional resilience.

All NICs are connected to switches and therefore any node shutdown would not result in a "disconnected media" state for the NIC.

Struggling how the number of NICs is important as the point is I have 2 nodes and if an entire node fails then all NIc connections between the 2 nodes would be down, whether I had 2 or 20 wouldn't it?

It appears to me to be a side-effect of the clustering split brain.  If the node that is the owner of the Quorum fail should the owner then transfer to the remaining node?
0
 
LVL 37

Expert Comment

by:Mahesh
ID: 39986022
In your case Quorum is not getting transferred to another node in case of reboot of quorum owner, that is the problem

Can you check that quorum is set from storage end correctly (I mean cluster support is enabled from storage end) and also you have installed MPIO feature on both cluster nodes

Mahesh.
0
 

Author Comment

by:nmxsupport
ID: 39986300
Storage is fibre channel SAN and MPIO is enabled.  
All CSVs/quorum is visible from both nodes.  I expect (but have not confirmed) that if one node fails the other node will still be able to see all the resources.

Tell me Mahesh,
In your view, in a 2 node "2012" cluster, given a failure of any node (bearing in mind only one will be the owner of the quorum at any time) should the cluster continue to operate correctly?
0
 

Author Comment

by:nmxsupport
ID: 39986325
Interestingly re-running the Quorum wizard I get the following, will investigate further.

Quorum Configuration:  Node Majority
Cluster Managed Voting:  Enabled

The recommended setting for your number of voting nodes is Node and Disk Majority, however Node Majority was selected because an appropriate disk could not be found.
Your cluster quorum configuration will be changed to the configuration shown above.
0
 
LVL 37

Accepted Solution

by:
Mahesh earned 500 total points
ID: 39987182
I think wizard is not recognizing the quorum disk causing this problem.
Is your quorum disk is added into available cluster disk during config...?

Node and Disk Majority (recommended for clusters with an even number of nodes)
You need to adopt above configuration for cluster and select quorum disk manually, then it should work
Also then you should check that cluster mode is changed to node and disk majority
Also check if both servers are selected as possible owners automatically for selected quorum

Or else you can configure File share witness as a quorum

Check below article
http://technet.microsoft.com/en-us/library/jj612870.aspx

Mahesh.
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Understanding the various editions available is vital when you decide to purchase Windows Server 2012. You need to have a basic understanding of the features and limitations in each edition in order to make a well-informed decision that best suits y…
What to do when Windows Update is not working correctly? What tools can I use to detect the cause of the malfunction problem? What does this numeric error code mean? These and other questions that you have been asking in the past are answered here (…
In this Micro Tutorial viewers will learn how to restore their server from Bare Metal Backup image created with Windows Server Backup feature. As an example Windows 2012R2 is used.
This tutorial will walk an individual through the process of installing of Data Protection Manager on a server running Windows Server 2012 R2, including the prerequisites. Microsoft .Net 3.5 is required. To install this feature, go to Server Manager…

730 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question