Avatar of Hodor
HodorFlag for United States of America

asked on 

Server 2012 R2 Failover Cluster not working, logon failure

I have two 2012 R2 servers which are identical and clustered. The cluster has a shared resource (SAN) which has all my VMs on CSV. After I clustered the servers I could create and move a working live VM between the two nodes. I decided to run a test of failure in the event of one node goes down (pulled the network cables). The servers then migrated from node A to node B however instead of a saved state or live migration the VM "turned off", powers up, and comes online. When I restart node A the live migration works perfect.

What configuration am I missing? I want to setup if one node fails the VM will save, migrate to second node, and resume.
Windows Server 2012Hyper-VMicrosoft Virtual ServerVirtualization

Avatar of undefined
Last Comment
Hodor
ASKER CERTIFIED SOLUTION
Avatar of Cliff Galiher
Cliff Galiher
Flag of United States of America image

Blurred text
THIS SOLUTION IS ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
Avatar of Hodor
Hodor
Flag of United States of America image

ASKER

How can no network connection yield an unexpected shutdown of a VM when the node is fully powered and functional? The cluster should detect there is nothing wrong with the node but move to a node which has network connection. This seems to be a huge flaw in a Windows 2012 cluster environment.
Avatar of Cliff Galiher
Cliff Galiher
Flag of United States of America image

You are pulling the network cable! You are effectively killing the VM, so it isn't shutting down so much as it is assuming the VM status is no longer reliable (did the network drop because the VM froze? Or because the host froze? Or because a network cable got tripped on? Or because ...?) so it cannot assume it should reliably migrate the VM.

Assuming you have set up all of your nodes with separate NICs for heartbeat, live migration, and guest services, and have each of those running through a separate switch, and have enabled guest network monitoring protection, and *only* the guest network dies, and the VM live migrates, and *still* is unresponsive because the problem was that the guest network stack froze...then people would say it is a "major flaw" that windows live-migrated an unhealthy VM!  There is no winning.....someone will always say "Microsoft did it wrong."

At least this way the result of an unplanned failure regardless of the kind of failure, s predictable. That is by design. Not a flaw.
Avatar of Ross Alas
Ross Alas

If you pulled the network cable over which cluster communication happens, a quorum needs to be established by either one of the two nodes. The other node that doesn't establish quroum shuts down its guests while the other brings them up.

Otherwise, you'll have two independent nodes that will try to access the same VM. The other node wouldn't know if the other node is alive without the network cable.
Avatar of Hodor
Hodor
Flag of United States of America image

ASKER

Loss of network connection shouldn't result in a cold turn off of a VM especially when the node is alive and the shared resource is online.

Yes a heartbeat connection is setup but it was taken off during testing to just see what happens.

The main take away is the "Live Migration" is what requires a network connection.
Avatar of Cliff Galiher
Cliff Galiher
Flag of United States of America image

"Loss of network connection shouldn't result in a cold turn off of a VM especially when the node is alive and the shared resource is online."

The shared resource is *NOT* online! That's the point we've been trying to make. You simulated a type of failure which *explicitly* takes that particular node *OFFLINE.*  A cold reboot of the VM is expected in that scenario.
SOLUTION
Avatar of Ross Alas
Ross Alas

Blurred text
THIS SOLUTION IS ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
Avatar of Hodor
Hodor
Flag of United States of America image

ASKER

Shared resource was referencing storage and not the network (I know the network is shared as well). I understand the need to fail over if a network issue has been detected but the node should be smart enough to know it can do a graceful shutdown and fail over to the next node. Graceful fail over(with a shared storage) shouldn't depend on network availability. A cold shutdown and fail over doesn't make sense but I will have to accept it.

- Quorum is set with a witness disk on the SAN shared between nodes.
- I have heart beat setup with redundant network connections. (This question was asked as a test scenario)
Virtualization
Virtualization

Virtualization is the act of creating a virtual (rather than actual) version of something, including (but not limited to) a virtual computer hardware platform, operating system (OS), storage device, or computer network resources. Virtualization is usually the creation of a system that executes separate from the underlying hardware resources, or the creation of an entire desktop for systems located elsewhere, similar to thin clients.

22K
Questions
--
Followers
--
Top Experts
Get a personalized solution from industry experts
Ask the experts
Read over 600 more reviews

TRUSTED BY

IBM logoIntel logoMicrosoft logoUbisoft logoSAP logo
Qualcomm logoCitrix Systems logoWorkday logoErnst & Young logo
High performer badgeUsers love us badge
LinkedIn logoFacebook logoX logoInstagram logoTikTok logoYouTube logo