Solved

2 Node Windows 2012 Cluster issues with Quorum

Posted on 2014-04-08
8
5,812 Views
Last Modified: 2014-05-09
Hello,

We have a 2 Node cluster (N1 & N2) running Hyper-V and VMs along with CSVs and running as "Node and Disk Majority (Quorum)" Quorum configuration.  

In order to perform maintenance on N1 we live migrated all VMs across to N2 and all was working fine.  After this shutting down N1 destroyed the cluster on N2 and we had no running VMs and the cluster was unavailable to attach to.  The only to get things working again was to quickly restart N1.

After connecting to the cluster the following error message was observed on N2,

"The Cluster service is shutting down because quorum was lost. This could be due to the loss of network connectivity between some or all nodes in the cluster, or a failover of the witness disk."

The owner of the CSV disk resources and quorum was N1 which could explain the issue.

Questions from here

1) If a shutdown initiates on N1 why does it only migrate VMs automatically and not cluster resources that are required?  Reading on I think we should "pause" the node so that it enters maintenance mode.

However the big (more worryingly) question is the following

2) In a 2 node cluster if the owner that the looks after the disk witness fails then the entire cluster will fail and provides no resilience whatsoever.  Is this correct?

If this is true my 2-node cluster only has a 50/50 chance of remaining working after a failure of any one of the nodes!

Thanks
0
Comment
Question by:nmxsupport
  • 4
  • 4
8 Comments
 
LVL 35

Expert Comment

by:Mahesh
ID: 39985826
Cluster requires 51% votes in order to remain alive

With TWO node cluster and single Quorum disk both members and quorum give one vote each to cluster
So in case you loss quorum disk still your cluster will remain alive because of TWO votes from two cluster nodes
OR
if single node get down still cluster has 1 vote from quorum and one vote from another node

Now you need to check all cluster resources properties and check if both servers are selected as a possible owner, otherwise once server hosting resource fails, resource will not get up on another node
Also check dependencies of resources and ensure that on Quorum disk should not have any dependencies

During cluster node maintenance, When you live migrate VMs from one node to another also move quorum disk resource manually to another server (Ensure that all resource owner is another server in advance to avoid any unfortunate problems)

lastly I hope you have TWO network cards per server, 1 for heart beat and one for cluster
If you have only one NIC per cluster node, then there are chances of failure are more when you reboot any cluster node.

Mahesh.
0
 

Author Comment

by:nmxsupport
ID: 39985869
Thank you Mahesh.

I checked and the Quorum disk has no dependencies and the Advanced Policies tab shows both N1 and N2 as possible owners.

In the Policies tab I have the following set,
* If resource fails, attempt restart on current node
Period for restarts = 15:00
Maximum restarts in the specified period = 1
Delay between restarts = 0.5
* If restart is unsuccesful, fail over all resources in this role
* If all the restart attempts fail, begin restarting again afte the specifed period = 01:00

There is no way to access any properties of the CSV volumes.

The critical error "The Cluster service is shutting down because quorum was lost. This could be due to the loss of network connectivity between some or all nodes in the cluster, or a failover of the witness disk." seems to indicate if the witness disk fails over, then the quorum is automatically lost and the cluster is shut down.
0
 
LVL 35

Expert Comment

by:Mahesh
ID: 39985902
Clustered Shared Volumes allows nodes to share access to storage, which means that the applications on that piece of storage can run on any node, or on different nodes, at any time.  CSV breaks the dependency between application resources (the VMs) and disk resources (for CSV disks) so that in a CSV environment it does not matter where the disk is mounted because it will appear local to all nodes in the cluster.  CSV manages storage access differently than regular clustered disks

You can not add or change the Possible Owner of a CSV from the GUI. You need to use cluster.exe command line for that.
http://virtuallyaware.wordpress.com/2011/11/28/blog-highlight-add-possible-owner-to-a-cluster-shared-volume/

How many network cards do you have per server \ cluster node

I hope you have TWO network cards per server, 1 for heart beat and one for cluster
If you have only one NIC per cluster node, then there are chances of failure are more when you reboot any cluster node.

Mahesh.
0
 

Author Comment

by:nmxsupport
ID: 39986010
Hello each server has 4 NICs, 2 teamed for LAN and 2 teamed for cluster/management.  Switches are cross linked and provide additional resilience.

All NICs are connected to switches and therefore any node shutdown would not result in a "disconnected media" state for the NIC.

Struggling how the number of NICs is important as the point is I have 2 nodes and if an entire node fails then all NIc connections between the 2 nodes would be down, whether I had 2 or 20 wouldn't it?

It appears to me to be a side-effect of the clustering split brain.  If the node that is the owner of the Quorum fail should the owner then transfer to the remaining node?
0
6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

 
LVL 35

Expert Comment

by:Mahesh
ID: 39986022
In your case Quorum is not getting transferred to another node in case of reboot of quorum owner, that is the problem

Can you check that quorum is set from storage end correctly (I mean cluster support is enabled from storage end) and also you have installed MPIO feature on both cluster nodes

Mahesh.
0
 

Author Comment

by:nmxsupport
ID: 39986300
Storage is fibre channel SAN and MPIO is enabled.  
All CSVs/quorum is visible from both nodes.  I expect (but have not confirmed) that if one node fails the other node will still be able to see all the resources.

Tell me Mahesh,
In your view, in a 2 node "2012" cluster, given a failure of any node (bearing in mind only one will be the owner of the quorum at any time) should the cluster continue to operate correctly?
0
 

Author Comment

by:nmxsupport
ID: 39986325
Interestingly re-running the Quorum wizard I get the following, will investigate further.

Quorum Configuration:  Node Majority
Cluster Managed Voting:  Enabled

The recommended setting for your number of voting nodes is Node and Disk Majority, however Node Majority was selected because an appropriate disk could not be found.
Your cluster quorum configuration will be changed to the configuration shown above.
0
 
LVL 35

Accepted Solution

by:
Mahesh earned 500 total points
ID: 39987182
I think wizard is not recognizing the quorum disk causing this problem.
Is your quorum disk is added into available cluster disk during config...?

Node and Disk Majority (recommended for clusters with an even number of nodes)
You need to adopt above configuration for cluster and select quorum disk manually, then it should work
Also then you should check that cluster mode is changed to node and disk majority
Also check if both servers are selected as possible owners automatically for selected quorum

Or else you can configure File share witness as a quorum

Check below article
http://technet.microsoft.com/en-us/library/jj612870.aspx

Mahesh.
0

Featured Post

What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

Join & Write a Comment

Sometimes drives fill up and we don't know why.  If you don't understand the best way to use the tools available, you may end up being stumped as to why your drive says it's not full when you have no space left!  Here's how you can find out...
The recent Microsoft changes on update philosophy for Windows pre-10 and their impact on existing WSUS implementations.
In this Micro Tutorial viewers will learn how they can get their files copied out from their unbootable system without need to use recovery services. As an example non-bootable Windows 2012R2 installation is used which has boot problems.
This tutorial will walk an individual through the process of installing the necessary services and then configuring a Windows Server 2012 system as an iSCSI target. To install the necessary roles, go to Server Manager, and select Add Roles and Featu…

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now