• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 4152
  • Last Modified:

Unable to configure HA

Hi I am unable to configure HA on one particular host

There are 3 hosts in my environment. esxsvr1, esxsvr2, esxsvr3 . HA is configuring properly on esxsvr2 and esxsvr3 but it uis always failing to configure on esxsvr1

Uninstalled all the agents and disconnected and re-added the host back to cluster, still unable to configure HA.

All the DNS setting are perfect.
0
Sandu_vmware
Asked:
Sandu_vmware
  • 2
  • 2
  • 2
  • +4
1 Solution
 
bgoeringCommented:
I have had that issue before and what has worked for me is to unconfigure, then reconfigure HA on the cluster - then all the hosts came online.
0
 
Pete LongTechnical ConsultantCommented:
also check the hostname entries on esxsvr1  - they are CASE sensitive http://www.petenetlive.com/KB/Article/0000276.htm


Pete
www.petenetlive.com
0
 
Danny McDanielClinical Systems AnalystCommented:
to follow on with what the others have said...

on each host run the following commands and make sure the output is correct:
hostname
hostname -s
hostname -i
cat /etc/hosts (make sure that 127.0.0.1 localhost entry is there, as well as the correct hostname/IP)
ping vcname
ping vcname.domain.name
ping esxsvrX
ping esxsvrX.domain.name (replace X with the other two hosts number)
df -h (make sure plenty of disk space for file systems especially / and /var/log)
esxcfg-vswitch -l (make sure that VM portgroup names are all the same from host to host)
esxcfg-vswif -l (make sure that all hosts have same subnet mask and are on same subnet)

If none of that is the cause, let me know at what percentage it fails when your try to reconfigure for HA.
0
Upgrade your Question Security!

Your question, your audience. Choose who sees your identity—and your question—with question security.

 
madunixChief Information Security Officer Commented:
Are you able to connect to each host via DNS?
Check the /etc/host and /etc/resolv.conf files to ensure that ....
0
 
bgoeringCommented:
Another thing to look at is amount of memory allocated to the service console. Sometimes when running DRS and HA the default just isn't enough. To check go to the vSphere client, select your esx host, go to configuration tab, and click link for memory. I set mine to the max of 800 MB, if it isn't at 800 click the properties link to change it. A system restart will be required to implement any change.

Good Luck
0
 
rccg94Commented:
Hi,

Not sure if this applies to you but one thing that has stung me in the past with setting up HA is the default gateway settings.  Generally, we do not route our ESXi host IPs - they are on a private subnet/network.  If the host cannot PING the default gateway address, HA will not complete.  It does not have to be able to actually get out, it just needs the ping to respond.  So if you have no gateway defined on that particular host (or a dummy one, like 10.10.10.1 , etc), set the gateway to an address that will actually reply and see what you get.

Good luck!
0
 
madunixChief Information Security Officer Commented:
i had also the same issue with gateway ..... as said above check the gateway
0
 
michelkeusCommented:
Indeed if you check your gateway and it does not allow for ICMP ECHO requests then you might be helped by setting the das.isolationaddress to an IP that does allow ICMP ECHO requests. If you set das.isolationaddress it allows for an extra check.

In VC inventory section right click on cluster >> edit settings >> VMware HA >> Click the button that says Advanced Options >> select an empty box and manually type das.isolationaddress and put the IP address in as a value.
0
 
Sandu_vmwareAuthor Commented:
Hey thank you for all your quick respond guys.

Below is the error message that I observed in /var/log/vmware/aam/aam_config_util_addnode.log
FULLTIME_SITES_TID 00000005
+ 1:8042,8042,8043 esxsvr1    vmware #FT_Agent_Port=8045
+ 2:8042,8042,8043 esxsvr2 vmware
+ 3:8042,8042,8043 exssvr3 vmware
09/14/10 14:05:32 [vpxa_respond        ] VMwareerrortext=Internal AAM Error - agent could not start.
09/14/10 14:05:32 [vpxa_respond        ] VMwareerrorcat=internalerror
09/14/10 14:05:32 [myexit              ] copying /etc/opt/vmware/aam/vmware-sites to /var/log/vmware/aam/aam_config_util_addnode.log
FULLTIME_SITES_TID 00000005
+ 1:8042,8042,8043 esxsvr1    vmware #/
+ 2:8042,8042,8043 esxsvr2 vmware
+ 3:8042,8042,8043 exssvr3 vmware
09/14/10 14:05:32 [myexit              ] Failure location:
09/14/10 14:05:32 [myexit              ]        function main::myexit called from line 2306
09/14/10 14:05:32 [myexit              ]        function main::start_agent called from line 1238
09/14/10 14:05:32 [myexit              ]        function main::add_aam_node called from line 210
09/14/10 14:05:32 [myexit              ] VMwareresult=failure
09/14/10 14:05:32 [elapsed_time        ] Total time for script to complete:  6 minute(s) and 17 second(s)



Any update on this please ??
0
 
Sandu_vmwareAuthor Commented:
Hello Friends,

I got HA to be configured perfectly fine.

There was actually a wrong entry in /etc/hosts file for esxsvr3

It is registered as exssvr3, I found it by using the wireshark tool, although I corrected the hosts file entry the issue still persists.

So I changed the hostname on kernel level so that all the files are corrected in esxsvr3

#sysctl -w kernel.hostname=<NEW FQDN NAME>

and rebooted the host.

Hurray, The HA configuration went perfectly fine now.

0
 
Danny McDanielClinical Systems AnalystCommented:
Had you followed my advice in comment #3, I believe you would have found the cause and saved yourself some time.
0

Featured Post

Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

  • 2
  • 2
  • 2
  • +4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now