Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 701
  • Last Modified:

vmware esxi 5 Host Not Responding

I have had this problem before and it has improved since I replaced the HDDs in the shared storage devices (iSCSI) I am using.  However, today this happened again. I am wondering why I can't ping the host even though I know the host is powered on.  I have to power cycle the host to get it online again.

I am not sure exactly how the heartbeat works but from what I've read, the hosts send out a heartbeat to vcenter server every 10 seconds.  If there is no response in 60 seconds the host drops out of the cluster.

any ideas as to why the host is "not responding" to anything even a ping?
0
IKtech
Asked:
IKtech
  • 7
  • 6
1 Solution
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
If it's the iSCSI issue we've seen the host ESXi OS, gets hung up polling, and polling the iSCSI datastore, this thread takes up all CPU time, resulting in a non-responsive ESXI OS.

What version of ESXi 5.x ? are you using?

Also what SAN (iSCSI) are you using, and is it on the HCL?

Non-compatible iSCSI devices seem to have issues with this, and homebrew SANS
0
 
IKtechAuthor Commented:
ESXi 5.0.0 build 768111

QNAP NAS devices.  These devices are sold as VMware compatible, TS-x96 series is on the HCL  I have TS-469 QNAPs

Would the iSCSI issue cause the host to be unresponsive to a ping?  

Maybe I could use NFS datastores instead.
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Would the iSCSI issue cause the host to be unresponsive to a ping?

Yes.

ESXi 5.0.0 build 768111 is the GA release of ESXi 5.0, it's been updated many times.

I would update to the latest and last version of ESXi 5.0, which is Update 3.

After checking the VMware HCL for the TS-469 QNAPs, iSCSI is only listed for U2 and U3, not U0, which is what you are using.

NAS (NFS) is listed for 5.0.

I would recommending updating your version of ESXi 5.0 U0 to U3 at least, and get ALL the benefits of the issues and fixes, and certification for your iSCSI SAN.
0
Veeam Disaster Recovery in Microsoft Azure

Veeam PN for Microsoft Azure is a FREE solution designed to simplify and automate the setup of a DR site in Microsoft Azure using lightweight software-defined networking. It reduces the complexity of VPN deployments and is designed for businesses of ALL sizes.

 
IKtechAuthor Commented:
I am assuming I don't need to upgrade vcenter server, or maybe it is necessary with U3?

Thanks!!
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Ideally, you would update both together, vCenter Server first, and then ESXi.
0
 
IKtechAuthor Commented:
This has happened again over the weekend.  On july 30 when it happened it was host A that stop responding, over the weekend it was host B that stopped responding.  It seems like whichever host has been up longer has a higher risk of having this issue.

When it happened on the 8/16, I went to our datacenter and I was able to login to the console with no issues.  the first thing I tried was restarting management agents.  They stopped but hung on starting.  I had to hold the power button and do a hard shutdown and reboot to get it going again.

Hopefully upgrading the software will help but I thought this info may provide some clues as to what is going on.  Do you have any thoughts Andrew?  Thanks!!

Maybe I should think about restarting management agents on schedule.  Do you think that may help?
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Is you storage stable ?

did this happen before or after upgrade the hang?

Did you update  to a supported version ?
0
 
IKtechAuthor Commented:
I haven't upgraded yet...

The storage has no errors or disconnects from the other host.  HA rebooted the VMs on the other host the last time this happened.
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Your environment and hosts are *NOT SUPPORTED* on the current version of ESXi.

It's happening because it's NOT SUPPORTED!!!!!!

Therefore it's an untested environment, and therefore anything could happen in production!

This is why the HCL exists, because that is the tested and certified environment.

Again, I would seriously recommend upgrading to a Verified environment by VMware and Qnap.

If you raise a support call with VMware or Qnap, this would be the first item, on their list to advise and check.
0
 
IKtechAuthor Commented:
I will upgrade.  No question about that.  

However I just wanted your opinion regarding a scheduled restart of the management services.
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Underlying Storage issue, which is not stable, is not going to be cured, by restart of the management services.
0
 
IKtechAuthor Commented:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2035701

I am using the net adapters mentioned in this article, they are also using an older driver than the one mentioned that "resolves the issue"

This seems to fit my situation and I have updated the drivers.  It's a shame that I haven't found this sooner.  I was focused on storage issues and didn't examine other possibilities.  Shame on me for that...

Thanks for your help.  I will plan to upgrade as well in the near future.
0
 
IKtechAuthor Commented:
updated drivers for network adapters.
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 7
  • 6
Tackle projects and never again get stuck behind a technical roadblock.
Join Now