We help IT Professionals succeed at work.

VMWare ESX - A possible host failure has been detected by HA on HOST1 in cluster ESXCLUSTER

Medium Priority
4,760 Views
Last Modified: 2012-05-11
Hello Experts,

I rebooted an ESX server after some hardware updating, and now I receive this error every 5 minutes.
________________________
Target: ESXCLUSTER
Stateless event alarm

Alarm Definition:
([Event alarm expression: HA host isolated] OR [Event alarm expression: All HA hosts isolated] OR [Event alarm expression: HA host failed])
 
Event details:
A possible host failure has been detected by HA on HOST1 in cluster ESXCLUSTER in Location.
________________________

I have tried:
- Reconfiguring each host (2) for HA.
- Disabling HA and reenabling
- Removing HOST1 from cluster and re-adding.

These possible fixes were found here: http://communities.vmware.com/message/1187575

Thanks,
Comment
Watch Question

VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017
Commented:
Unlock this solution and get a sample of our free trial.
(No credit card required)
UNLOCK SOLUTION

Author

Commented:
Added RAM to both nodes (2 of them). Second node is fine, the error always comes from node 1. Will have a look at the nics and switches...
Danny McDanielClinical Systems Analyst
Commented:
Unlock this solution and get a sample of our free trial.
(No credit card required)
UNLOCK SOLUTION

Author

Commented:
Big thanks for the ideas!

I know each RAM module is identical in each node (HP UDIMMs, same size/type/mfg for each installed module in each node ~ both nodes are identical). Regardless, it's worth a check I think. I'm not too versed in the console commands. Once in the console, logged in as root, what do I type in exactly? 'cat/proc/vmware/NUMA/hardware' doesnt work. (thanks for the patience too) :-)
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
is this ESXi?
Danny McDanielClinical Systems Analyst

Commented:
there should be a space between 'cat' and the first '/'

cat is a command and the /proc/vmware/NUMA/hardware is a virtual file that it lists out

Author

Commented:
VMware vSphere 4.0 U2 ~ Managed by vCenter Server.

Thanks for the console details ~ but I get "No such file or directory" Is there a variable in there? i.e. should NUMA = host or something like this? (I need to study-up on working with the ESX CLI and console) :-)
Danny McDanielClinical Systems Analyst

Commented:
NUMA is used on AMD processors and the later Intel's, so you probably don't have to worry about this.

since you are in the console, check the logs for NIC connection errors with 'grep -i down /var/log/vmkernel*'  If there has been any recent losses in connectivity, we should see some indication of it this way.
Commented:
Unlock this solution and get a sample of our free trial.
(No credit card required)
UNLOCK SOLUTION

Author

Commented:
So far - all good. Still no errors after changing the alarms.

Author

Commented:
Points for the effort and helping out - thanks!
Unlock the solution to this question.
Thanks for using Experts Exchange.

Please provide your email to receive a sample view!

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.