Hyper-V Failover Cluster doesn't failover when nic unplugged

Posted on 2014-02-25
Medium Priority
Last Modified: 2014-11-12
  I have a hyper-v failover cluster running with the following config:

NAS -  Windows Server 2012 with a bunch of drives acting as a NAS
VMHOST1 - Windows Server 2012 with a few VM's
VMHOST2 - Windows Server 2012 with a few VM's

10.0.0.X Corp network
192.168.1.X storage network
192.168.150.X hearbeat network

Using iSCSI targets to the NAS for VM storage.  Failover validation passes.  VM's all up and running fine, if I simulate failover by doing  alive migration it works perfectly.  If I simulate a failover by stopping the cluster service on one of the VMHOST's it works perfectly.  I had an issue today that another tech was in the server room and dislodged the corp network cable to VMHOST2.  I would have expected the failover to kick in and the VM's to move to VMHOST1.  This did not happen.  Instead my phone blew up that all the VM's on VMHOST2 were down.

I attempted to access VMHOST2 through the corp network and saw it was down.  was able to RDP in through storage network.  See that corp nic shows unplugged.  Had tech reseat the cable and all connectivity restored.

My question is why didn't it failover thinking the nic failed if the cable was unplugged?  How can I test this further and what am I missing in my setup that it passes validation and simulated tests but not a real world failure?

Thanks in advance.
Question by:compcreate
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
LVL 42

Expert Comment

by:Adam Brown
ID: 39886970
Are you using a file share witness that both servers can access as the third vote in the cluster? If one of the servers can't access the file witness share, it's vote will always count as a shutdown for the cluster.
LVL 37

Expert Comment

ID: 39886971
In all storage, VM and name and Ip resources properties you must define possible owner so that if connectivity losses any time resources can migrate to another host hopefully

Because when you do live migration \ stop cluster service, you are forcing manually VMs to switch over, hence it is working


Author Comment

ID: 39887590
Not sure how to reply to your questions.  I have a quorum disk setup.  I setup things based on a MS doc on how to setup a failover cluster with two hosts and one shared iSCSI storage.

The directions included creating a quorum that would be the deciding factor since there is only 2 hosts.

I think I may have found the problem.  According to a doc I found online, a NEW feature in 2012 R2 (I am only running 2012 not R2) it states that the vm nic works independent of the physical nic even though you tell it to piggy back on the physical.  So in versions PRIOR to 2012R2 you had to use nic teaming to mitigate this issue.  In 2012R2 they added a feature in the virtual nic config called "Protected Network" which essentiall binds the states of the virtual nic to the state of the physical allowing the failover to be initiated when the physical nic goes down.

Can anyone confirm this thought process?
LVL 42

Accepted Solution

Adam Brown earned 2000 total points
ID: 39887707
Well, if you have 2 hosts, you have to configure what is called a File Share Witness. In any situation where you have an even number of hosts in a cluster, you have to have something that is able to provide a third vote. All systems in the cluster communicate with one other periodically (this is the Heartbeat). All the members of the cluster are referred to as a quorum, and each system has a "vote" on whether the cluster remains operating. In order for the cluster to remain operating more than half the nodes in the cluster must vote "yes" (any system that has a vote assigned and is able to send a heartbeat is considered as voting yes). This means that all clusters have to have an odd number of nodes. With two servers, you have to be able to add another node. That's what the file share witness is for.

The file share witness is basically just a network shared folder on a computer that is not one of the members of the cluster. If you have the file share witness pointing to a file share that is on one of the host's in the cluster, as soon as that system goes down it counts as two no votes. So in a situation where the cluster has two systems and a file share witness, if the witness share is pointing to one of the nodes, the cluster will fail if that node shuts down entirely. I suspect that you may have configured your file share witness to point to a share on one of the cluster nodes, rather than a computer somewhere else. I say that because you can shut down the cluster services and a lot of other stuff on a cluster node that has the FSW on it and it will still be an operable cluster because the FSW is still accessible from the other node in the cluster. But as soon as that server looses network connectivity or powers off, the cluster fails because it and the FSW are no longer accessible and their votes switch to no.

Author Comment

ID: 39887834
Very Interesting... but I would have to say I should be good then.  I have the two hosts in the cluster, and then I setup another iSCSI target that was called quorum (by following the docs) and that resides on the NAS (a third system not part of the cluster).  When I look at the failover cluster manager under storage I see my 1GB quorum disk and next to it, it states "assigned to" disk witness in quorum.  

So that sounds exactly as you are describing.  So I feel better about that part.

Back to the issue of the nic not causing the failover.  Can anyone confirm this is by design (or lack thereof) in 2012 and earlier and nic teaming is the only solution or upgrade to R2 and use "Protected Network"???


Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this article, I am going to show you how to simulate a multi-site Lab environment on a single Hyper-V host. I use this method successfully in my own lab to simulate three fully routed global AD Sites on a Windows 10 Hyper-V host.
What if you have to shut down the entire Citrix infrastructure for hardware maintenance, software upgrades or "the unknown"? I developed this plan for "the unknown" and hope that it helps you as well. This article explains how to properly shut down …
In this Micro Tutorial viewers will learn how to use Windows Server Backup to create full image of their system. Tutorial shows how to install Windows Server Backup Feature on Windows 2012R2 and how to configure scheduled Bare Metal Recovery backup.…
This tutorial will walk an individual through the process of configuring their Windows Server 2012 domain controller to synchronize its time with a trusted, external resource. Use Google, Bing, or other preferred search engine to locate trusted NTP …
Suggested Courses

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question