[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1142
  • Last Modified:

ESX Host is not responding

Hello,

I have three ESX Host and all are running without errors, however every now and then I get a Host esx1.rapa.local in RAPA DC is not responding. I get this within my events and it's happening to all my ESX Host (esx2.rapa.local and esx3.rapa.local). I do not know why it's losing connection to host and it happens very fast and connects back to all host. I haven't received any complaints that the network was down, but it's making me worry.

Thanks for your help and support.

nimdatx
0
Jaime Campos
Asked:
Jaime Campos
  • 6
  • 5
  • 3
2 Solutions
 
bgoeringCommented:
Where do you see this message?
0
 
Jaime CamposAuthor Commented:
When I checked cluster events, I see a error reading Insufficient resources to satisfy HA failover level on cluster DC Cluster in RAPA DC and Unable to contact a primary HA agent in cluster DC Cluster in RAPA DC.

This is error is happening every 2 - 7 hours

Thanks,

nimdatx
0
 
Jaime CamposAuthor Commented:
Highlighted DC Cluster - Task and Events - Show all entries and Show cluster entries
0
Veeam and MySQL: How to Perform Backup & Recovery

MySQL and the MariaDB variant are among the most used databases in Linux environments, and many critical applications support their data on them. Watch this recorded webinar to find out how Veeam Backup & Replication allows you to get consistent backups of MySQL databases.

 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Troubleshooting VMware High Availability (HA)

I would recommend walking trough this long document

http://kb.vmware.com/kb/1001596

0
 
bgoeringCommented:
For any HA issues I always recommend starting with http://kb.vmware.com/kb/1001596 for troubleshooting procedures. Often they can be traced down to dns resolution not working. You need to be able to ping gateways and each other.

As far as the resources it appears that the cluster is overcommitted on vms. Strict admission control is enabled on the cluster, you can get rid of the message by disabling strict admission control - but be sure you review http://kb.vmware.com/kb/1007006 before doing so.

Hope this helps
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Check Gateways are correct and can be reached from All nodes, and also check licensing.

Un-configure HA, for several minutes and re-apply, and if needs be remove nodes from the cluster and add back, and also re-apply HA to each node.
0
 
Jaime CamposAuthor Commented:
What is a good NTP server to use?

How do I remove nodes from cluster and add back?

How do I re-apply HA to each node?

How much if I hire you to take a look at my setup?

0
 
bgoeringCommented:
LOL, hanccocka may  be in business to consult, but unfortunately I cannot.

To remove put in maintenance mode, right click node and select remove.

I would recommend just edit cluster settings, uncheck HA, save. Edit again, recheck HA, and save - this will reconfigure HA on all cluster nodes.
0
 
bgoeringCommented:
NTP Appliance: http://www.vmware.com/appliances/directory/210

I use this - it works well, much less expensive (free) than an atomic clock...
0
 
Jaime CamposAuthor Commented:
Ok, I noticed on my ESX3 server I had esx3.Rapa.local (R was capitalized) and I had DNS Suffix set to Rapa.local, rapa.local, which seemed odd that I would have done that.

esx1 (192.168.1.170)
esx2 (192.168.1.175)
esx3 (192.168.1.180)

I telnet into 192.168.170 and nslookup to esx2 and this was my response:

login as: root
root@192.168.1.170's password:
You have activated Tech Support Mode.
The time and date of this activation have been sent to the system logs.

VMware offers supported, powerful system administration tools.  Please
see www.vmware.com/go/sysadmintools for details.

Tech Support Mode may be disabled by an administrative user.
Please consult the ESXi Configuration Guide for additional
important information.

~ # nslookup esx1
Name:      esx1
Address 1: 192.168.1.170 esx1.rapa.local

~ # nslookup esx2
Name:      esx2
Address 1: 192.168.1.175 esx2

~ # nslookup esx3
Name:      esx3
Address 1: 192.168.1.180 esx3.rapa.local

~ # nslookup esx2
Name:      esx2
Address 1: 192.168.1.175 esx2 (NOTICE THAT .rapa.local doesn't show up?)

- now when I telnet into esx2 and do the same nslookup test, instead of it being esx2 not displaying full FQDN it's esx3.

I also notice that I'm using DNS suffix for all my ESX servers, am I supposed to do this?

Thanks so much.

nimdatx
 
0
 
Jaime CamposAuthor Commented:
I think I found the source of the issue.

On my VCenter Server my DNS is pointing to my new DNS server, which have completed the migration of DHCP to new server, so it should still be pointing to my old DNS server.
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Very good, let us know, if this has fixed the issue, Networking, DNS, Gateways are very important for workiing ha.
0
 
bgoeringCommented:
Yes, name resolution is critical. You should be able to both ping and vmkping all of the hosts from each of the hosts by both the short (esx1) name and the fqdn (esx1.rapa.local).
0
 
Jaime CamposAuthor Commented:
Thanks.
0

Featured Post

NEW Veeam Backup for Microsoft Office 365 1.5

With Office 365, it’s your data and your responsibility to protect it. NEW Veeam Backup for Microsoft Office 365 eliminates the risk of losing access to your Office 365 data.

  • 6
  • 5
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now