We have a 2 Node cluster (N1 & N2) running Hyper-V and VMs along with CSVs and running as "Node and Disk Majority (Quorum)" Quorum configuration.
In order to perform maintenance on N1 we live migrated all VMs across to N2 and all was working fine. After this shutting down N1 destroyed the cluster on N2 and we had no running VMs and the cluster was unavailable to attach to. The only to get things working again was to quickly restart N1.
After connecting to the cluster the following error message was observed on N2,
"The Cluster service is shutting down because quorum was lost. This could be due to the loss of network connectivity between some or all nodes in the cluster, or a failover of the witness disk."
The owner of the CSV disk resources and quorum was N1 which could explain the issue.
Questions from here
1) If a shutdown initiates on N1 why does it only migrate VMs automatically and not cluster resources that are required? Reading on I think we should "pause" the node so that it enters maintenance mode.
However the big (more worryingly) question is the following
2) In a 2 node cluster if the owner that the looks after the disk witness fails then the entire cluster will fail and provides no resilience whatsoever. Is this correct?
If this is true my 2-node cluster only has a 50/50 chance of remaining working after a failure of any one of the nodes!