?
Solved

VMware Host not responding

Posted on 2010-01-05
7
Medium Priority
?
4,431 Views
Last Modified: 2012-05-08
Thought it might be good to relay a recent incident that I encountered with an ESXi 3.5 cluster host.  I was on vacation and came back in the office to find that one of the other admins had been troubleshooting an issue where a host in the cluster seen through VC (virtualcenter) was in a "not responding" state.  After attempting to disconnect and reconnect, HA was showing failures for that node in the recent tasks pane but I'm not a vmware guru.  I did the usual and googled the heck out of it to no avail.  I also came to find out that this production infrastructure was not under a support contract, so I was on my own.  I could connect directly to the host through vsphere client and manage it from there.  

Ultimately, I ended up powering down each vm guest individually from a VIC to direct IP session, jumped over to the VC session, did a connect on the bad host and while the HA was loading in the Recent tasks (progress bar moving...) I'd migrate the VM.  Figured it was dangerous, but I really had no other options other than waiting for ppl to respond to posts I made.  Anyway, doing that seemed to work.  I had to repeat the process for each VM guest, but once they were all moved, I consoled to the esxi blade, rebooted it, and it then came back to VC like it never had a problem.  

One thing that was curious was in the VC session, I could see that the lun that hosted all the templates for the company was showing as offline on that host.  In the direct to IP session though, it was fine.  Never did figure out why that happened, but I suspect that was directly tied to my problem if it wasnt the root cause itself.  Once the server was rebooted, that lun was back in the VC session as well like nothing was wrong.

If someone has any ideas on why this might have happened and a better way of troubleshooting it in the future, please offer your insights and advice!

Take care!  Earl
0
Comment
Question by:egrylls
  • 2
  • 2
  • 2
  • +1
7 Comments
 
LVL 42

Assisted Solution

by:Paul Solovyovsky
Paul Solovyovsky earned 1000 total points
ID: 26185854
If you could connect directly to the host but not in vCenter most likely a DNS issue.

In Virtual Center make sure the resolution is correct to the host and that the hosts are added via FQDN.

On the ESX host check /etc/host and /etc/resolv.conf file for correct configuration.  

You may need to diable HA and re-enable on the host as well
0
 
LVL 23

Expert Comment

by:bhanukir7
ID: 26185870
Hi Egrylls,

I have faced this situation earlier and this is caused due to the performance of the Storage that is added to the ESX server.

We have a farm of 16 ESX blades running close to 1000 VMs. On set of blades "8" were initially set up with SATA drives array over FC and everything was fine till we had 200 VMs running but as we added more and more VMs we started running into issues as the ESX hosts were going offline from the VC and so were the datastores.

We moved aware from that and got SCSI drive array and we have not seen any issues so far.

So best place for you to start finding out why this happened and how to avoid this happenning again, would suggest to you to verify the performance of the storage that you are using and if possible try to fix that from that end.

regards
bhanu
0
 
LVL 4

Assisted Solution

by:VMwareGuy
VMwareGuy earned 1000 total points
ID: 26190216
First thing you should always do when you can't connect to - or if there is an issue with your connection between vCenter and ESXi \ ESX is to:

1)  check DNS configuration on the ESXi server and your DNS server that ESX points to making sure you have the appropriate entries

2) try to disconnect and reconnect your ESXi host from your vCenter inventory, this uninstalls and reinstalls the vCenter agent using FQDN and then with IP address if FQDN didn't work, if it doesn't work then proceed to step 3.
3)  Try Restarting both the vCenter management agent on the ESX host and the ESX host management agent.  Learn how to do this here:  http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1003490

4)  If the above didn't do anything for you, and since you mentioned that you lost connectivity to a LUN, which can cause problems with ESX (less now than earlier versions ESX 2.x), connect to ESXi host directly with VI Client and perform a rescan of your storage adaptors and LUNs.    

5)  IF none of these worked, you need to do exactly what you said you did, however, you can also attempt to migrate the VMs live using the RCLI before you start powering down VMs, see documentation on how to do this in the RCLI here:  http://www.vmware.com/pdf/vi3_35/esx_3/r35u2/vi3_35_25_u2_rcli.pdf

but this will most likely not work either since your, but it is worth a try before you start reaching out to everyone to let them know you have to bring down VMs.. and if that doesn't work bring the VMs down and register them on some of your other host servers if you have them and bring them back up.  then you can reboot your ESX box.  
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 1

Author Comment

by:egrylls
ID: 26344864
I tried all the above and nothing's worked.  HA is now disabled...waiting for VMware support contract to get fixed all up.
0
 
LVL 23

Expert Comment

by:bhanukir7
ID: 26344935
hi egrylls,

please do update us with the findings

regards
bhanu
0
 
LVL 1

Accepted Solution

by:
egrylls earned 0 total points
ID: 26853041
Still dont have support, so I am closing this question.  HA is turned off and stuff has quit failing, but now we have no HA....
0
 
LVL 4

Expert Comment

by:VMwareGuy
ID: 26853151
Egrylls - just out of curiosity, I ran into a problem not long ago where HA stopped working, come to find out the network team made a change to the switch that ESX was using as its gateway to deny ping requests, a typical security configuration.  IF ESX can't ping the gateway HA will fail, I would check this out.  Also, not sure if I asked whether or not you are connected to storage via iSCSI, are you?  If yes, be sure you have extended your iSCSI parameters in your VMs.  If there is a failover event your VMs can lose their connection to the storage temporarily.  
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

If we need to check who deleted a Virtual Machine from our vCenter. Looking this task in logs can be painful and spend lot of time, so the best way to check this is in the vCenter DB. Just connect to vCenter DB(default DB should be VCDB and using…
When rebooting a vCenters 6.0 and try to connect using vSphere Client we get this issue "Invalid URL: The hostname could not parsed." When we get this error we need to do some changes in the vCenter advanced settings to fix the issue.
Teach the user how to delpoy the vCenter Server Appliance and how to configure its network settings Deploy OVF: Open VM console and configure networking:
This Micro Tutorial steps you through the configuration steps to configure your ESXi host Management Network settings and test the management network, ensure the host is recognized by the DNS Server, configure a new password, and the troubleshooting…
Suggested Courses

571 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question