ESXi 5.1 Host Randomly Lose Connectin w/ vCenter

He have recently started to experience an issue where random host will be listed as disconnected then reconnected one second later. All VMs run on the host remain accessible. This happens at random and is not specific to a time or day or anything.

We are running ESXi 5.1 update 3 on our host and vCenter 5.1 build 941893
LVL 20
compdigit44Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
1. Check networking between vCenter Server and hosts, this may require the use of software logging, e.g. Pingplotter, to check if there is any "down time" on the network.

2. Check the vCenter Server logs.

3. Maybe the heartbeat parameter needs altering.

4. No firewall changes of late ?

5. Your vCenter Server is very old.... Build 941893, the current Build for 5.1 is 2669725, it may be worth updating. April 2015 released!

6. Update your hosts to Build 2583090 (released 3 months ago!).
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
compdigit44Author Commented:
This article should just like my issue although I cannot say 100% since my logging is not turned up high enough. I have 100 host and would hate to turn up logging that high just to wait for this to happen again. I sounds like a network issue to me....

http://www.teimouri.net/esxesxi-host-keeps-disconnecting-and-reconnecting-when-heartbeats-are-not-received-by-vcenter-server/
0
compdigit44Author Commented:
Where can  check this...  "Maybe the heartbeat parameter needs altering."

And as always thank you for your guidance!!!!!
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
before changing anything, look at the simple network timeouts between vCenter Server and hosts..

also your issue can also be more problematic, if vCenter Server is virtual and moving around your cluster of 100 hosts with DRS.
0
compdigit44Author Commented:
THanks I will report my ping result shortly... Are vCEnter server is virtual but DRS is set to manual in the cluster it resides on.
0
compdigit44Author Commented:
I am using Capsa's free ping tool .. now I am not seeing any host disconnect message right now..

I am pinging 10 host and the responce time are averaging between 0.3 - 4.0 ms... There is not firewall between vcenter and the host although use use vlan heavly
0
compdigit44Author Commented:
I am find as time goes on  and more users log into the network, the the highest responces time are now at 6.0ms randomly
0
compdigit44Author Commented:
I had one host spike to 33ms but even generated a host disconnect
0
compdigit44Author Commented:
So far no host disconnects this morning... any suggestion on next steps since this is so random
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
lots of logging, and correlation with network and logging of pings, and disconnects to rule out physical network.
0
compdigit44Author Commented:
Should I turn up loggin in vcenter? If sos with over 100 host will it affect performance..

THanks Again
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Network Info Logging....Ping Plotter.
0
compdigit44Author Commented:
Thanks... we do not use Ping Plotter but use Capsa's Free ping tool...

In regards to Network Information Logging.. would I following KB 1004795??? Would this just be on the vcenter server???
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
I would establish, do you have any "ping timeouts" from vCenter Server to all your hosts.
0
compdigit44Author Commented:
Of course totally when I was monitoring pings from the vcenter server to host I saw pings were averaging around 2 - 3ms with a couple spikes to 130ms. There where no ping timeouts though.

On the vCenter server under vCenter Server setting I enabled verbose logging and restarted the services. How it is just sit and wait..
0
compdigit44Author Commented:
I just had another host disconnect alert and check the C:\ProgramData\Vmware.......\vpdx-29383.log file and not see anything about a disconnect
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
any does this correlate with any ping/network statistics you have collected, independently.
0
compdigit44Author Commented:
I have some updated information..... I grabbed the wrong archived vpxd file and the correct one does list that the host in question missed: 2513214 heart beats
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
umm, network issue ?

check your ping logs ?

check host networking? (and physical side switch for errors)
0
compdigit44Author Commented:
I am leaning more towards networking but everytime I ask our Network team to check the switch & logs every is always fine...... nothing has changed on the host and or vcenter
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
do you have any networking proof or diagnostics, generated diagnoses by your tools to suggest this?

funny that, that's the same answer we get when ever yalking to Network Ops! Hence why we do our own networking 80% of the time!
0
compdigit44Author Commented:
I do not have any hard proof but have a gut feeling since we have and are having way to may wierd issues..

with other appliances / devices lossing their network connections for once second then reconnecting.

The high number of heart beats.. has to mean something to them... It is never the same host, time of day or anything else
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Time for Network Probe!
0
compdigit44Author Commented:
are you referring to an application or what needs to be done. If you are referring to the second option... we have tried and are request fall on deaf ears
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Difficult to proceed, if you have network issues, your infrastructure is only as good as your network.
0
compdigit44Author Commented:
true and if you have a network group that would actually lisen that would be even better
0
compdigit44Author Commented:
if I use the free ping tool l listed before I have to list all 110 host and let it run and hope that a disconnect will happen...

I really hate doing something like this on a server that is highly used ...
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
I'm afraid that's the only way to obtain evidence and proof, that is the issue.
0
compdigit44Author Commented:
I know.... just a bumper it has to come to things like this
0
compdigit44Author Commented:
How many heart beat can a host miss in 10ms before it is listed as disconnected?
0
compdigit44Author Commented:
Ok we had anyother host disconnect a today yet with verbose logging enabled was not listed in the vpxd log in vcenter. Yet I noticed 30 minutes before the disconnect there was a disk latancy alert. ESXi is installed locally though and is not boot from SAN
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Did the disconnect correlate with any network outage, confirmed by your monitoring.

Do you have iSCSI datastores ?

you have not been removing or re-scanning datastores, because the iSCSI can poll, and keep polling, and this causes CPU events to get hung for a while, and the host can go non-responsive!

otherwise, you will need to look forward to updating all your hosts and vCenter Server to a more current build.

as originally posted for a Production Environment, which is more criticial,

5. Your vCenter Server is very old.... Build 941893, the current Build for 5.1 is 2669725, it may be worth updating. April 2015 released!

and I would certainly schedule and test new build of ESXi hosts.
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
heartbeats:- hosts sends a heartbeats every 10 seconds.

vCenter Server - window of 60 seconds to receive the heartbeats.

If the UDP heartbeat message is not received by vCenter Server, the host as not responding.
0
compdigit44Author Commented:
Thank you so much for the reply...

1) We are not using iSCSI but FC & FCoE
2) Disconnects do not happen durning rescan but just happen out of the Blue
3) Yes I know are vcenter build it very old but this has been running fine for a only time one and only recently has this issue popped up

I know our Network group is setting up Cisoc ISE but not on the server segments which the host are on
0
compdigit44Author Commented:
Well I just found something very odd. Last week I enabled verbose logging in vcenter. When I checked it today it mysterious was set back to information and I am the only one who can change this.
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Umm, that's odd ?

Well you thought you were!
0
compdigit44Author Commented:
After I reset the logging level to verbose then restart the vcenter service the logging level changes back to informational.. Why????
0
compdigit44Author Commented:
This really scares me and makes me think something is really wrong with our vcenter server. Yes , know it is old and everything else but has been running fine and there have not been any changes..

I am suspicious of the network though. Networking still says nothing is wrong.
0
compdigit44Author Commented:
Not sure if they make a difference (it shouldn't but our vCEnter server is a VM. I have thought about moving it to another host but scared to do so this this is the vcenter server and it would be managing it own migration. I have read this can be done though)

Currently the VM is hosted on a ESXi server that is running 5.0.0 and is one of our older ESXi server
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
All our vCenter Serves are VMs, and they are in large Clusters, with DRS enabled, and vCenter VM moves all over the cluster!

No issues.

Although all our clients are on 5.1 (R&D) and 5.5 and later now.
0
compdigit44Author Commented:
No granted the vCEnter server has been running farelyy well for a while now. After reading this KB http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2040630 I see the  VirtualCenter.ManagedIP is blank?????
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
should have an IP address in
0
compdigit44Author Commented:
I agree but how has it been working this long??? Also will adding the address cause any problem i.e: host disconnects, storage etc..

Do services need to be restarted
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
You do not need to put an IP Address in the box, but it's recommended!

Most setup's we've seen, do not have it completed.
0
compdigit44Author Commented:
Thanks Hancock!!!! I am still looking for the cause of these disconnects.... and why my logging settings seem to reset after the vcenter service is restarted
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
VMware

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.