Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

Hyper-V Failover Cluster doesn't failover when nic unplugged

Posted on 2014-02-25
5
Medium Priority
?
880 Views
Last Modified: 2014-11-12
Hi,
  I have a hyper-v failover cluster running with the following config:

SERVERS
NAS -  Windows Server 2012 with a bunch of drives acting as a NAS
VMHOST1 - Windows Server 2012 with a few VM's
VMHOST2 - Windows Server 2012 with a few VM's

NETWORK
10.0.0.X Corp network
192.168.1.X storage network
192.168.150.X hearbeat network

Using iSCSI targets to the NAS for VM storage.  Failover validation passes.  VM's all up and running fine, if I simulate failover by doing  alive migration it works perfectly.  If I simulate a failover by stopping the cluster service on one of the VMHOST's it works perfectly.  I had an issue today that another tech was in the server room and dislodged the corp network cable to VMHOST2.  I would have expected the failover to kick in and the VM's to move to VMHOST1.  This did not happen.  Instead my phone blew up that all the VM's on VMHOST2 were down.

I attempted to access VMHOST2 through the corp network and saw it was down.  was able to RDP in through storage network.  See that corp nic shows unplugged.  Had tech reseat the cable and all connectivity restored.

My question is why didn't it failover thinking the nic failed if the cable was unplugged?  How can I test this further and what am I missing in my setup that it passes validation and simulated tests but not a real world failure?

Thanks in advance.
0
Comment
Question by:compcreate
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
5 Comments
 
LVL 43

Expert Comment

by:Adam Brown
ID: 39886970
Are you using a file share witness that both servers can access as the third vote in the cluster? If one of the servers can't access the file witness share, it's vote will always count as a shutdown for the cluster.
0
 
LVL 38

Expert Comment

by:Mahesh
ID: 39886971
In all storage, VM and name and Ip resources properties you must define possible owner so that if connectivity losses any time resources can migrate to another host hopefully

Because when you do live migration \ stop cluster service, you are forcing manually VMs to switch over, hence it is working

Mahesh
0
 

Author Comment

by:compcreate
ID: 39887590
Not sure how to reply to your questions.  I have a quorum disk setup.  I setup things based on a MS doc on how to setup a failover cluster with two hosts and one shared iSCSI storage.

The directions included creating a quorum that would be the deciding factor since there is only 2 hosts.

I think I may have found the problem.  According to a doc I found online, a NEW feature in 2012 R2 (I am only running 2012 not R2) it states that the vm nic works independent of the physical nic even though you tell it to piggy back on the physical.  So in versions PRIOR to 2012R2 you had to use nic teaming to mitigate this issue.  In 2012R2 they added a feature in the virtual nic config called "Protected Network" which essentiall binds the states of the virtual nic to the state of the physical allowing the failover to be initiated when the physical nic goes down.

Can anyone confirm this thought process?
0
 
LVL 43

Accepted Solution

by:
Adam Brown earned 2000 total points
ID: 39887707
Well, if you have 2 hosts, you have to configure what is called a File Share Witness. In any situation where you have an even number of hosts in a cluster, you have to have something that is able to provide a third vote. All systems in the cluster communicate with one other periodically (this is the Heartbeat). All the members of the cluster are referred to as a quorum, and each system has a "vote" on whether the cluster remains operating. In order for the cluster to remain operating more than half the nodes in the cluster must vote "yes" (any system that has a vote assigned and is able to send a heartbeat is considered as voting yes). This means that all clusters have to have an odd number of nodes. With two servers, you have to be able to add another node. That's what the file share witness is for.

The file share witness is basically just a network shared folder on a computer that is not one of the members of the cluster. If you have the file share witness pointing to a file share that is on one of the host's in the cluster, as soon as that system goes down it counts as two no votes. So in a situation where the cluster has two systems and a file share witness, if the witness share is pointing to one of the nodes, the cluster will fail if that node shuts down entirely. I suspect that you may have configured your file share witness to point to a share on one of the cluster nodes, rather than a computer somewhere else. I say that because you can shut down the cluster services and a lot of other stuff on a cluster node that has the FSW on it and it will still be an operable cluster because the FSW is still accessible from the other node in the cluster. But as soon as that server looses network connectivity or powers off, the cluster fails because it and the FSW are no longer accessible and their votes switch to no.
0
 

Author Comment

by:compcreate
ID: 39887834
Very Interesting... but I would have to say I should be good then.  I have the two hosts in the cluster, and then I setup another iSCSI target that was called quorum (by following the docs) and that resides on the NAS (a third system not part of the cluster).  When I look at the failover cluster manager under storage I see my 1GB quorum disk and next to it, it states "assigned to" disk witness in quorum.  

So that sounds exactly as you are describing.  So I feel better about that part.

Back to the issue of the nic not causing the failover.  Can anyone confirm this is by design (or lack thereof) in 2012 and earlier and nic teaming is the only solution or upgrade to R2 and use "Protected Network"???

Thanks
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

If your vDisk VHD file gets deleted from the image store accidentally or on purpose, you won't be able to remove the vDisk from the PVS console. There is a known workaround that is solid.
Giving access to ESXi shell console is always an issue for IT departments to other Teams, or Projects. We need to find a way so that teams can use ESXTOP for their POCs, or tests without giving them the access to ESXi host shell console with a root …
In this Micro Tutorial viewers will learn how they can get their files copied out from their unbootable system without need to use recovery services. As an example non-bootable Windows 2012R2 installation is used which has boot problems.
This course is ideal for IT System Administrators working with VMware vSphere and its associated products in their company infrastructure. This course teaches you how to install and maintain this virtualization technology to store data, prevent vuln…

598 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question