asked on

Quorum Disk

I have a two node sql cluster,
When quorum disk aka witness disk is on node 1 along with all other services, and i kill node one, it doesnt failover. I might not be waiting long enough here

When node 1 has the quorum and node two has the other services, and I fail node 2, it works just fine, as the quorum disk never went offline.

It seems like the quorum disk doesnt move to the live node.

Is this true? Am I not waiting enought time? what is the best option for a two node clister. I have node and disk majority configured.

Silly question to kick things off: did you setup your quorum disk to have your two nodes as possible owners?
  1. Any indications of failure on the event log?
  2. After you kill node one, which resources successfully transition to node 2?
  3. Can you show me the advanced settings of Q? i.e. see screenshot

Also, if this is not in production yet...what happens if you literally bounce your first node?
How a Cluster deals with a Quorum under Windows 2008 is very different to how to worked under Windows 2003....

See here for further clarification...

Consider which 'Mode' your quorum is operating as.....
This is indeed a Windows 2008 cluster, and  my mode is set as

Node and Disk Majority: Each node plus a designated disk in the cluster storage (the “disk witness”) can vote, whenever they are available and in communication. The cluster functions only with a majority of the votes, that is, more than half.

From what i read, and i may be wrong, it is n-1 if disk witness is offline.
Node and Disk Majority
Can sustain failures if 1 node(s) with the witness disk online
Can sustain failures of 0 node(s) if the witness disk goes offline or fails.

I am confused why the quorum disk didnt failover, unless it didnt meet the timeout period, the whole cluster failed when the node with the quorum disk failed.

It works and fails over when the quorum disk stays online.
Neither of those actually explain why the quorum disk didn't fail over, unless it's really down to luck. If I read this right, it means that if you're lucky, the node that fails isn't the one that also happens to be hosting the quorum disk. If you're not, and the failed node is hosting the quorum disk, then both the node and the disk are down and you're explaining to your soon to be ex-boss that High-Availability really means cross your fingers!
Neither of them did explain it. Here is what you need to do - the quorum disks time to failover is set (as standard) to 15 minutes - set it down to a more useful number (such as 30 secs). In the Failover Cluster Manager choose Storage and right click on your Quorum disk. Choose Properties and then the Policy tab. Set the time to restart the resource as 30 secs and the attempts as 1. Make sure that the 'If restart is unsuccsseful then fail over all resources or services in this application' is ticked.

When the node with the quorum dies now then after 30 secs the quorum fails over onto the other node and it starts bringing any services that were on the failed node.