vmware esxi 5 Host Not Responding

I have had this problem before and it has improved since I replaced the HDDs in the shared storage devices (iSCSI) I am using.  However, today this happened again. I am wondering why I can't ping the host even though I know the host is powered on.  I have to power cycle the host to get it online again.

I am not sure exactly how the heartbeat works but from what I've read, the hosts send out a heartbeat to vcenter server every 10 seconds.  If there is no response in 60 seconds the host drops out of the cluster.

any ideas as to why the host is "not responding" to anything even a ping?
LVL 3
IKtechAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
If it's the iSCSI issue we've seen the host ESXi OS, gets hung up polling, and polling the iSCSI datastore, this thread takes up all CPU time, resulting in a non-responsive ESXI OS.

What version of ESXi 5.x ? are you using?

Also what SAN (iSCSI) are you using, and is it on the HCL?

Non-compatible iSCSI devices seem to have issues with this, and homebrew SANS
0
IKtechAuthor Commented:
ESXi 5.0.0 build 768111

QNAP NAS devices.  These devices are sold as VMware compatible, TS-x96 series is on the HCL  I have TS-469 QNAPs

Would the iSCSI issue cause the host to be unresponsive to a ping?  

Maybe I could use NFS datastores instead.
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Would the iSCSI issue cause the host to be unresponsive to a ping?

Yes.

ESXi 5.0.0 build 768111 is the GA release of ESXi 5.0, it's been updated many times.

I would update to the latest and last version of ESXi 5.0, which is Update 3.

After checking the VMware HCL for the TS-469 QNAPs, iSCSI is only listed for U2 and U3, not U0, which is what you are using.

NAS (NFS) is listed for 5.0.

I would recommending updating your version of ESXi 5.0 U0 to U3 at least, and get ALL the benefits of the issues and fixes, and certification for your iSCSI SAN.
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

IKtechAuthor Commented:
I am assuming I don't need to upgrade vcenter server, or maybe it is necessary with U3?

Thanks!!
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Ideally, you would update both together, vCenter Server first, and then ESXi.
0
IKtechAuthor Commented:
This has happened again over the weekend.  On july 30 when it happened it was host A that stop responding, over the weekend it was host B that stopped responding.  It seems like whichever host has been up longer has a higher risk of having this issue.

When it happened on the 8/16, I went to our datacenter and I was able to login to the console with no issues.  the first thing I tried was restarting management agents.  They stopped but hung on starting.  I had to hold the power button and do a hard shutdown and reboot to get it going again.

Hopefully upgrading the software will help but I thought this info may provide some clues as to what is going on.  Do you have any thoughts Andrew?  Thanks!!

Maybe I should think about restarting management agents on schedule.  Do you think that may help?
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Is you storage stable ?

did this happen before or after upgrade the hang?

Did you update  to a supported version ?
0
IKtechAuthor Commented:
I haven't upgraded yet...

The storage has no errors or disconnects from the other host.  HA rebooted the VMs on the other host the last time this happened.
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Your environment and hosts are *NOT SUPPORTED* on the current version of ESXi.

It's happening because it's NOT SUPPORTED!!!!!!

Therefore it's an untested environment, and therefore anything could happen in production!

This is why the HCL exists, because that is the tested and certified environment.

Again, I would seriously recommend upgrading to a Verified environment by VMware and Qnap.

If you raise a support call with VMware or Qnap, this would be the first item, on their list to advise and check.
0
IKtechAuthor Commented:
I will upgrade.  No question about that.  

However I just wanted your opinion regarding a scheduled restart of the management services.
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Underlying Storage issue, which is not stable, is not going to be cured, by restart of the management services.
0
IKtechAuthor Commented:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2035701

I am using the net adapters mentioned in this article, they are also using an older driver than the one mentioned that "resolves the issue"

This seems to fit my situation and I have updated the drivers.  It's a shame that I haven't found this sooner.  I was focused on storage issues and didn't examine other possibilities.  Shame on me for that...

Thanks for your help.  I will plan to upgrade as well in the near future.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
IKtechAuthor Commented:
updated drivers for network adapters.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
VMware

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.