Link to home
Start Free TrialLog in
Avatar of compdigit44
compdigit44

asked on

vCenter 5.1 HA

Here is are setup In one cluster we have 4 IBM Flex blades and 4 new Cisco UCS blades. EVC is enabled. The host running on the UCS's are boot from SAN. As a test today we pulled dropped both sides of the network fabric to simulate a host failure. vCenter tried to move the running VM to another random host but failed. Now I have noticed that I can vmotion between Flex but not from Flex to UCS. I can only do it if the VM if off. I believe this has to do with difference in the CPU's even though UCS is enabled. Also the VM's in question have extremely high reservation which the UCS are the only who do not through resource errors when try to move the VM's.

Question.
1) With boot from SAN. IF both sides of the Network fail would the VM keep running. I believe the default isolation response it to leave powered on.

2) When HA selects a host to failover to how does it select the host? Least utilized?
Avatar of Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)
Flag of United Kingdom of Great Britain and Northern Ireland image

vCenter tried to move the running VM to another random host but failed.

Did it?

Because VMware HA does not move running VMs via vMotion.

VMware HA, restarts VMs on other available hosts due to a host failure. No vMotion is involved with VMware HA. (this is a cold start, like a migration, so no EVC applicable!).

I also think you mean EVC is enabled, BUT have you made sure, that the baseline for EVC is the lowest generation available, because if you can vMotion from one host to another, but not the reverse, it would suggest this.

So I think you opening post is a bit confused, as to what VMware HA, vMotion, and DRS does!

1, Correct - default if not changed is - Leave powered on.

2. VMware HA priority is to restart VMs, on other available hosts fast! So it's not least utilized, you can end up with heavily loaded hosts after a HA event!

Hence why it's important if you have HA and DRS (licensed), you enabled it, because it will kick in, CPU and Memory reservations are checked to see, if a host has the resources for the VM to be started on it!

for testing HA, we much prefer the real and live test, and just power off, reset, pull the cable (power) out of a host!
Avatar of compdigit44
compdigit44

ASKER

Hancock good to hear from you. You are correct. The VM that vCenter did try to move were powered off but the powered on VM's have the following message listed under events.

"vSphere HA unsuccessfully failed over this virtual machine. VSphere HA will retry if the max number of attempts has not been reached..."

Also regarding HA, HA does not care about host resources usage which is where DRS comes in. But what happens when DRS is in Partial or manual mode
You dropped both sides of what fabric ? the FI's on UCS ?

It sounds to me like you pulled too much stuff at once and HA couldnt move something because it wasnt there anymore.

Also just as a side note vcenter is not involved at all with HA... its a kernel function of ESXi. So even if vcenter goes down HA still occures.

If you want to simulate a host failure you are better off to just had power down the blade ... this is much more realistic ...  also if you want to simulate a network failure only pull one at a time, pulling two is unrealistic in 99% of the cases. At that point you should just move your stuff to a better datacenter or get newer hardware. :)
If you DRS is in Partial or Manual mode, your hosts will be heavily loaded! Until such time, you do something about it!

The answer for HA failover, will be contained in the logs, fdm.logs
As an aside, if you do not have the following volumes on your bookshelf, I would *HIGHLY RECOMMEND* them. These are the best source of VMware vSphere HA and DRS in the world!

VMware vSphere HA and DRS Technical deepdive

By Duncan Epping and Frank Denneman

(http://www.yellow-bricks.com/vmware-high-availability-deepdiv/)

Written by Duncan Epping and Frank Denneman, both of whom are Consulting Architects at VMware and are perceived by the industry as Subject Matter experts on VMware High Availability and VMware Distributed Resource Scheduler.

This book zooms in on two key components of every VMware based infrastructure. It covers the basic steps needed to create a VMware HA and DRS cluster, and goes on to explain the concepts and mechanisms behind HA and DRS which will enable you to make well educated decisions. You will get the tools to understand and implement e.g. HA admission control policies, DRS resource pools and resource allocation settings and more.

VMware vSphere 5.1 Clustering Deepdive on Amazon
All thank you so much for the great advice. My college was only suppose to pull one side of the Cisco FU but I think  that did not at once...Which makes sence that HA would fail becuase all paths to the connect to the other host are down.

On a side note since i am still new to the UCS. What would cause an event where both sides of the fabric go down also if once side goes down how quickly does Vmware pick the fact the path is active again.
Power Failures are common, depends if you have no UPS or Generator.

The FDM agents, Master and Slaves, and heartbeats between them act very quickly. (heartbeats are every second!)
Hancock,  I went back and read the link you posted earlier and they were very good.

So the situation I posted to being with a boot from SAN UCS blade when both sides of the fabric would be consider isolated / failed. It is boot from SAN what would happen to the VM's? I would assume they would lose theri network connect since the CNA card on the UCS uses QoS to split the network traffic between management and VM traffic
ASKER CERTIFIED SOLUTION
Avatar of Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial