Have you condigured your guests to startup on another host?
Do you have enough RAM to support all your guests running on one host?
What is the constraint setting for your HA cluster?
Main Topics
Browse All TopicsHave two ESX 4.0 servers (identical HP DL380 G5) connected to shared SAS datastore. Have three VM configed. Created a cluster and turned on HA. Everything is reporting as fine (able to ping hosts, verified DNS, no errors). VMotion works fine and I can migrate VM from one machine to the other. If I test HA (unplug NICs) - the VM's do not migrate and restart as expected. Have walked thru every HA guide I can find (created HA enabled cluster first and then added hosts to it). The only thing that I see is that at the point the server goes off-line vCenter records "HA agent has an error: HA agent has failed" - this is at the point that I would expect it to migrate. Any ideas?
This Question has been solved and asker verified All Experts Exchange premium technology solutions are available to subscription members.
Experts Exchange has been collecting answers to technology questions since 1996…3 million and counting! If you have a question, chances are we already have your answer.
If you can't find the exact answer you're looking for, ask our exclusive community of 50,000 experts. You’ll get a personalized answer from a trusted professional.
Thousands of free tech tips, tricks, how-to’s and tutorials are available in our peer reviewed articles section. See for yourself how smart our experts are, no login required.
Access the answers to your technology questions today.
30-day free trial. Register in 60 seconds.
Members of the expert community talk about why the experience at Experts Exchange is different than what you will find anywhere else.

Try it out and discover for yourself.
30-day free trial. Register in 60 seconds.
Join the community of experts here and help other tech pros by answering question in your area of expertise. You can earn FREE access to all Experts Exchange's premium features and resources.
by the way, this is not a new issue, had happened since 3.x
you can try to disjoine all hosts & recreate the cluster then all ESX/VC server must have their hosts file updated to include the below entries
- Loopback, always 127.0.0.1 localhost.localdomain localhost
- Local Server IP, FQDN, shortname
- Local Server console IP and <hostname>-cons
- Local Server VMotion IP Address, <hostname>-vmotion
- VirtualCentre Server IP Address. FQDN, shortname
- IP Address and DNS for all hosts in the same HA/DRS configuration
and ensure below is the standard settings in HA cluster(this is standard in environment i usually support)
Number of host failures the cluster can tolerate: 1
Allow VMs to be powered on even if they violate availability constraints: Enabled
VM Restart Prioirty: Low
Host Isolation response: Leave VM powered on
Enable Virtual machine monitoring: Not enabled
good luck!
verified HOSTS file settings, created new cluster and set HA up on it with:
Number of host failures the cluster can tolerate: 1 <cannot set this with setting below>
Allow VMs to be powered on even if they violate availability constraints: Enabled
VM Restart Prioirty: Low
Host Isolation response: Leave VM powered on
Enable Virtual machine monitoring: Not enabled
rebooted host (without placing it in maintenance mode) and VM did NOT restart on other host. Other ideas? Any good location to determine why it isn't working? (support log, etc).
Nothing shows up as in error - (other than note that we don't have redundant managment NIC) - the only thing that shows up is at the point of failure (Host is off-line) - there is a message that says "HA agent has error: HA agent has failed" - any particualr log to look in? We have tried VM monitoring both on and off but no difference...
Try these steps http://www.no-x.org/?p=155
nothing more than that error message -
Steps referenced didn't want to work (we have ESXi - so no full service console) but found a similar link using uninstall scripts -
(from the tech support console)
The scripts can be found in /opt/vmware/uninstallers.
To get there:
#cd /opt/vmware/uninstallers
Get a directory listing
#ls
-rwxr-xr-x 1 root root 857 VMware-aam-ha-uninstall.sh
-rwxr-xr-x 1 root root 434 -vpxa-uninstall.sh
To run these scripts,
./VMware-aam-ha-uninstall.
./VMware-vpxa-uninstall.sh
The agents are now removed, so re-do the HA config for the cluster
After this steps - resetup HA and retested but same result...
The only thing that I see is that at the point the server goes off-line vCenter records "HA agent has an error: HA agent has failed" - this is at the point that I would expect it to migrate. Any ideas?
When u configure HA cluster,esx inside cluster will be sending an heart beart to each esx servers,if agent heart beat is not responding for more than 15 secs,that particular host will be declared as 'Failed host or isolated from network'.
Please make sure your esx is reaching service console gateway.
Apologise If my answers are silly.
Thanks
Business Accounts
Answer for Membership
by: arunrajuPosted on 2009-07-15 at 16:43:22ID: 24865242
What setting have you configured for Host Isolation response ?