2 Node Windows 2012 Cluster issues with Quorum

Posted on 2014-04-08
Medium Priority
Last Modified: 2014-05-09

We have a 2 Node cluster (N1 & N2) running Hyper-V and VMs along with CSVs and running as "Node and Disk Majority (Quorum)" Quorum configuration.  

In order to perform maintenance on N1 we live migrated all VMs across to N2 and all was working fine.  After this shutting down N1 destroyed the cluster on N2 and we had no running VMs and the cluster was unavailable to attach to.  The only to get things working again was to quickly restart N1.

After connecting to the cluster the following error message was observed on N2,

"The Cluster service is shutting down because quorum was lost. This could be due to the loss of network connectivity between some or all nodes in the cluster, or a failover of the witness disk."

The owner of the CSV disk resources and quorum was N1 which could explain the issue.

Questions from here

1) If a shutdown initiates on N1 why does it only migrate VMs automatically and not cluster resources that are required?  Reading on I think we should "pause" the node so that it enters maintenance mode.

However the big (more worryingly) question is the following

2) In a 2 node cluster if the owner that the looks after the disk witness fails then the entire cluster will fail and provides no resilience whatsoever.  Is this correct?

If this is true my 2-node cluster only has a 50/50 chance of remaining working after a failure of any one of the nodes!

Question by:nmxsupport
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 4
LVL 37

Expert Comment

ID: 39985826
Cluster requires 51% votes in order to remain alive

With TWO node cluster and single Quorum disk both members and quorum give one vote each to cluster
So in case you loss quorum disk still your cluster will remain alive because of TWO votes from two cluster nodes
if single node get down still cluster has 1 vote from quorum and one vote from another node

Now you need to check all cluster resources properties and check if both servers are selected as a possible owner, otherwise once server hosting resource fails, resource will not get up on another node
Also check dependencies of resources and ensure that on Quorum disk should not have any dependencies

During cluster node maintenance, When you live migrate VMs from one node to another also move quorum disk resource manually to another server (Ensure that all resource owner is another server in advance to avoid any unfortunate problems)

lastly I hope you have TWO network cards per server, 1 for heart beat and one for cluster
If you have only one NIC per cluster node, then there are chances of failure are more when you reboot any cluster node.


Author Comment

ID: 39985869
Thank you Mahesh.

I checked and the Quorum disk has no dependencies and the Advanced Policies tab shows both N1 and N2 as possible owners.

In the Policies tab I have the following set,
* If resource fails, attempt restart on current node
Period for restarts = 15:00
Maximum restarts in the specified period = 1
Delay between restarts = 0.5
* If restart is unsuccesful, fail over all resources in this role
* If all the restart attempts fail, begin restarting again afte the specifed period = 01:00

There is no way to access any properties of the CSV volumes.

The critical error "The Cluster service is shutting down because quorum was lost. This could be due to the loss of network connectivity between some or all nodes in the cluster, or a failover of the witness disk." seems to indicate if the witness disk fails over, then the quorum is automatically lost and the cluster is shut down.
LVL 37

Expert Comment

ID: 39985902
Clustered Shared Volumes allows nodes to share access to storage, which means that the applications on that piece of storage can run on any node, or on different nodes, at any time.  CSV breaks the dependency between application resources (the VMs) and disk resources (for CSV disks) so that in a CSV environment it does not matter where the disk is mounted because it will appear local to all nodes in the cluster.  CSV manages storage access differently than regular clustered disks

You can not add or change the Possible Owner of a CSV from the GUI. You need to use cluster.exe command line for that.

How many network cards do you have per server \ cluster node

I hope you have TWO network cards per server, 1 for heart beat and one for cluster
If you have only one NIC per cluster node, then there are chances of failure are more when you reboot any cluster node.

Are your AD admin tools letting you down?

Managing Active Directory can get complicated.  Often, the native tools for managing AD are just not up to the task.  The largest Active Directory installations in the world have relied on one tool to manage their day-to-day administration tasks: Hyena. Start your trial today.


Author Comment

ID: 39986010
Hello each server has 4 NICs, 2 teamed for LAN and 2 teamed for cluster/management.  Switches are cross linked and provide additional resilience.

All NICs are connected to switches and therefore any node shutdown would not result in a "disconnected media" state for the NIC.

Struggling how the number of NICs is important as the point is I have 2 nodes and if an entire node fails then all NIc connections between the 2 nodes would be down, whether I had 2 or 20 wouldn't it?

It appears to me to be a side-effect of the clustering split brain.  If the node that is the owner of the Quorum fail should the owner then transfer to the remaining node?
LVL 37

Expert Comment

ID: 39986022
In your case Quorum is not getting transferred to another node in case of reboot of quorum owner, that is the problem

Can you check that quorum is set from storage end correctly (I mean cluster support is enabled from storage end) and also you have installed MPIO feature on both cluster nodes


Author Comment

ID: 39986300
Storage is fibre channel SAN and MPIO is enabled.  
All CSVs/quorum is visible from both nodes.  I expect (but have not confirmed) that if one node fails the other node will still be able to see all the resources.

Tell me Mahesh,
In your view, in a 2 node "2012" cluster, given a failure of any node (bearing in mind only one will be the owner of the quorum at any time) should the cluster continue to operate correctly?

Author Comment

ID: 39986325
Interestingly re-running the Quorum wizard I get the following, will investigate further.

Quorum Configuration:  Node Majority
Cluster Managed Voting:  Enabled

The recommended setting for your number of voting nodes is Node and Disk Majority, however Node Majority was selected because an appropriate disk could not be found.
Your cluster quorum configuration will be changed to the configuration shown above.
LVL 37

Accepted Solution

Mahesh earned 2000 total points
ID: 39987182
I think wizard is not recognizing the quorum disk causing this problem.
Is your quorum disk is added into available cluster disk during config...?

Node and Disk Majority (recommended for clusters with an even number of nodes)
You need to adopt above configuration for cluster and select quorum disk manually, then it should work
Also then you should check that cluster mode is changed to node and disk majority
Also check if both servers are selected as possible owners automatically for selected quorum

Or else you can configure File share witness as a quorum

Check below article


Featured Post

Problems using Powershell and Active Directory?

Managing Active Directory does not always have to be complicated.  If you are spending more time trying instead of doing, then it's time to look at something else. For nearly 20 years, AD admins around the world have used one tool for day-to-day AD management: Hyena. Discover why

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

My GPO's made for 2008 R2 servers were not allowing me to RDP into a new 2012 server by default.  That’s why I tried to allow RDP via Powershell, because I could log into a remote shell without further configuration. Below I will describe how I wen…
A procedure for exporting installed hotfix details of remote computers using powershell
In this Micro Tutorial viewers will learn how to use Boot Corrector from Paragon Rescue Kit Free to identify and fix the boot problems of Windows 7/8/2012R2 etc. As an example is used Windows 2012R2 which lost its active partition flag (often happen…
This tutorial will walk an individual through the process of configuring basic necessities in order to use the 2010 version of Data Protection Manager. These include storage, agents, and protection jobs. Launch Data Protection Manager from the deskt…
Suggested Courses

801 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question