Link to home
Start Free TrialLog in
Avatar of Sandu_vmware
Sandu_vmwareFlag for India

asked on

Unable to configure HA

Hi I am unable to configure HA on one particular host

There are 3 hosts in my environment. esxsvr1, esxsvr2, esxsvr3 . HA is configuring properly on esxsvr2 and esxsvr3 but it uis always failing to configure on esxsvr1

Uninstalled all the agents and disconnected and re-added the host back to cluster, still unable to configure HA.

All the DNS setting are perfect.
Avatar of bgoering
bgoering
Flag of United States of America image

I have had that issue before and what has worked for me is to unconfigure, then reconfigure HA on the cluster - then all the hosts came online.
Avatar of Pete Long
also check the hostname entries on esxsvr1  - they are CASE sensitive http://www.petenetlive.com/KB/Article/0000276.htm


Pete
www.petenetlive.com
to follow on with what the others have said...

on each host run the following commands and make sure the output is correct:
hostname
hostname -s
hostname -i
cat /etc/hosts (make sure that 127.0.0.1 localhost entry is there, as well as the correct hostname/IP)
ping vcname
ping vcname.domain.name
ping esxsvrX
ping esxsvrX.domain.name (replace X with the other two hosts number)
df -h (make sure plenty of disk space for file systems especially / and /var/log)
esxcfg-vswitch -l (make sure that VM portgroup names are all the same from host to host)
esxcfg-vswif -l (make sure that all hosts have same subnet mask and are on same subnet)

If none of that is the cause, let me know at what percentage it fails when your try to reconfigure for HA.
Avatar of madunix
madunix

Are you able to connect to each host via DNS?
Check the /etc/host and /etc/resolv.conf files to ensure that ....
Another thing to look at is amount of memory allocated to the service console. Sometimes when running DRS and HA the default just isn't enough. To check go to the vSphere client, select your esx host, go to configuration tab, and click link for memory. I set mine to the max of 800 MB, if it isn't at 800 click the properties link to change it. A system restart will be required to implement any change.

Good Luck
Hi,

Not sure if this applies to you but one thing that has stung me in the past with setting up HA is the default gateway settings.  Generally, we do not route our ESXi host IPs - they are on a private subnet/network.  If the host cannot PING the default gateway address, HA will not complete.  It does not have to be able to actually get out, it just needs the ping to respond.  So if you have no gateway defined on that particular host (or a dummy one, like 10.10.10.1 , etc), set the gateway to an address that will actually reply and see what you get.

Good luck!
i had also the same issue with gateway ..... as said above check the gateway
Indeed if you check your gateway and it does not allow for ICMP ECHO requests then you might be helped by setting the das.isolationaddress to an IP that does allow ICMP ECHO requests. If you set das.isolationaddress it allows for an extra check.

In VC inventory section right click on cluster >> edit settings >> VMware HA >> Click the button that says Advanced Options >> select an empty box and manually type das.isolationaddress and put the IP address in as a value.
Avatar of Sandu_vmware

ASKER

Hey thank you for all your quick respond guys.

Below is the error message that I observed in /var/log/vmware/aam/aam_config_util_addnode.log
FULLTIME_SITES_TID 00000005
+ 1:8042,8042,8043 esxsvr1    vmware #FT_Agent_Port=8045
+ 2:8042,8042,8043 esxsvr2 vmware
+ 3:8042,8042,8043 exssvr3 vmware
09/14/10 14:05:32 [vpxa_respond        ] VMwareerrortext=Internal AAM Error - agent could not start.
09/14/10 14:05:32 [vpxa_respond        ] VMwareerrorcat=internalerror
09/14/10 14:05:32 [myexit              ] copying /etc/opt/vmware/aam/vmware-sites to /var/log/vmware/aam/aam_config_util_addnode.log
FULLTIME_SITES_TID 00000005
+ 1:8042,8042,8043 esxsvr1    vmware #/
+ 2:8042,8042,8043 esxsvr2 vmware
+ 3:8042,8042,8043 exssvr3 vmware
09/14/10 14:05:32 [myexit              ] Failure location:
09/14/10 14:05:32 [myexit              ]        function main::myexit called from line 2306
09/14/10 14:05:32 [myexit              ]        function main::start_agent called from line 1238
09/14/10 14:05:32 [myexit              ]        function main::add_aam_node called from line 210
09/14/10 14:05:32 [myexit              ] VMwareresult=failure
09/14/10 14:05:32 [elapsed_time        ] Total time for script to complete:  6 minute(s) and 17 second(s)



Any update on this please ??
ASKER CERTIFIED SOLUTION
Avatar of Sandu_vmware
Sandu_vmware
Flag of India image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Had you followed my advice in comment #3, I believe you would have found the cause and saved yourself some time.