ESXi 5.1 Host Randomly Lose Connectin w/ vCenter

He have recently started to experience an issue where random host will be listed as disconnected then reconnected one second later. All VMs run on the host remain accessible. This happens at random and is not specific to a time or day or anything.

We are running ESXi 5.1 update 3 on our host and vCenter 5.1 build 941893
LVL 21
compdigit44Asked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
1. Check networking between vCenter Server and hosts, this may require the use of software logging, e.g. Pingplotter, to check if there is any "down time" on the network.

2. Check the vCenter Server logs.

3. Maybe the heartbeat parameter needs altering.

4. No firewall changes of late ?

5. Your vCenter Server is very old.... Build 941893, the current Build for 5.1 is 2669725, it may be worth updating. April 2015 released!

6. Update your hosts to Build 2583090 (released 3 months ago!).

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
compdigit44Author Commented:
This article should just like my issue although I cannot say 100% since my logging is not turned up high enough. I have 100 host and would hate to turn up logging that high just to wait for this to happen again. I sounds like a network issue to me....

http://www.teimouri.net/esxesxi-host-keeps-disconnecting-and-reconnecting-when-heartbeats-are-not-received-by-vcenter-server/
compdigit44Author Commented:
Where can  check this...  "Maybe the heartbeat parameter needs altering."

And as always thank you for your guidance!!!!!
10 Tips to Protect Your Business from Ransomware

Did you know that ransomware is the most widespread, destructive malware in the world today? It accounts for 39% of all security breaches, with ransomware gangsters projected to make $11.5B in profits from online extortion by 2019.

Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
before changing anything, look at the simple network timeouts between vCenter Server and hosts..

also your issue can also be more problematic, if vCenter Server is virtual and moving around your cluster of 100 hosts with DRS.
compdigit44Author Commented:
THanks I will report my ping result shortly... Are vCEnter server is virtual but DRS is set to manual in the cluster it resides on.
compdigit44Author Commented:
I am using Capsa's free ping tool .. now I am not seeing any host disconnect message right now..

I am pinging 10 host and the responce time are averaging between 0.3 - 4.0 ms... There is not firewall between vcenter and the host although use use vlan heavly
compdigit44Author Commented:
I am find as time goes on  and more users log into the network, the the highest responces time are now at 6.0ms randomly
compdigit44Author Commented:
I had one host spike to 33ms but even generated a host disconnect
compdigit44Author Commented:
So far no host disconnects this morning... any suggestion on next steps since this is so random
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
lots of logging, and correlation with network and logging of pings, and disconnects to rule out physical network.
compdigit44Author Commented:
Should I turn up loggin in vcenter? If sos with over 100 host will it affect performance..

THanks Again
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Network Info Logging....Ping Plotter.
compdigit44Author Commented:
Thanks... we do not use Ping Plotter but use Capsa's Free ping tool...

In regards to Network Information Logging.. would I following KB 1004795??? Would this just be on the vcenter server???
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
I would establish, do you have any "ping timeouts" from vCenter Server to all your hosts.
compdigit44Author Commented:
Of course totally when I was monitoring pings from the vcenter server to host I saw pings were averaging around 2 - 3ms with a couple spikes to 130ms. There where no ping timeouts though.

On the vCenter server under vCenter Server setting I enabled verbose logging and restarted the services. How it is just sit and wait..
compdigit44Author Commented:
I just had another host disconnect alert and check the C:\ProgramData\Vmware.......\vpdx-29383.log file and not see anything about a disconnect
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
any does this correlate with any ping/network statistics you have collected, independently.
compdigit44Author Commented:
I have some updated information..... I grabbed the wrong archived vpxd file and the correct one does list that the host in question missed: 2513214 heart beats
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
umm, network issue ?

check your ping logs ?

check host networking? (and physical side switch for errors)
compdigit44Author Commented:
I am leaning more towards networking but everytime I ask our Network team to check the switch & logs every is always fine...... nothing has changed on the host and or vcenter
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
do you have any networking proof or diagnostics, generated diagnoses by your tools to suggest this?

funny that, that's the same answer we get when ever yalking to Network Ops! Hence why we do our own networking 80% of the time!
compdigit44Author Commented:
I do not have any hard proof but have a gut feeling since we have and are having way to may wierd issues..

with other appliances / devices lossing their network connections for once second then reconnecting.

The high number of heart beats.. has to mean something to them... It is never the same host, time of day or anything else
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Time for Network Probe!
compdigit44Author Commented:
are you referring to an application or what needs to be done. If you are referring to the second option... we have tried and are request fall on deaf ears
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Difficult to proceed, if you have network issues, your infrastructure is only as good as your network.
compdigit44Author Commented:
true and if you have a network group that would actually lisen that would be even better
compdigit44Author Commented:
if I use the free ping tool l listed before I have to list all 110 host and let it run and hope that a disconnect will happen...

I really hate doing something like this on a server that is highly used ...
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
I'm afraid that's the only way to obtain evidence and proof, that is the issue.
compdigit44Author Commented:
I know.... just a bumper it has to come to things like this
compdigit44Author Commented:
How many heart beat can a host miss in 10ms before it is listed as disconnected?
compdigit44Author Commented:
Ok we had anyother host disconnect a today yet with verbose logging enabled was not listed in the vpxd log in vcenter. Yet I noticed 30 minutes before the disconnect there was a disk latancy alert. ESXi is installed locally though and is not boot from SAN
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Did the disconnect correlate with any network outage, confirmed by your monitoring.

Do you have iSCSI datastores ?

you have not been removing or re-scanning datastores, because the iSCSI can poll, and keep polling, and this causes CPU events to get hung for a while, and the host can go non-responsive!

otherwise, you will need to look forward to updating all your hosts and vCenter Server to a more current build.

as originally posted for a Production Environment, which is more criticial,

5. Your vCenter Server is very old.... Build 941893, the current Build for 5.1 is 2669725, it may be worth updating. April 2015 released!

and I would certainly schedule and test new build of ESXi hosts.
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
heartbeats:- hosts sends a heartbeats every 10 seconds.

vCenter Server - window of 60 seconds to receive the heartbeats.

If the UDP heartbeat message is not received by vCenter Server, the host as not responding.
compdigit44Author Commented:
Thank you so much for the reply...

1) We are not using iSCSI but FC & FCoE
2) Disconnects do not happen durning rescan but just happen out of the Blue
3) Yes I know are vcenter build it very old but this has been running fine for a only time one and only recently has this issue popped up

I know our Network group is setting up Cisoc ISE but not on the server segments which the host are on
compdigit44Author Commented:
Well I just found something very odd. Last week I enabled verbose logging in vcenter. When I checked it today it mysterious was set back to information and I am the only one who can change this.
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Umm, that's odd ?

Well you thought you were!
compdigit44Author Commented:
After I reset the logging level to verbose then restart the vcenter service the logging level changes back to informational.. Why????
compdigit44Author Commented:
This really scares me and makes me think something is really wrong with our vcenter server. Yes , know it is old and everything else but has been running fine and there have not been any changes..

I am suspicious of the network though. Networking still says nothing is wrong.
compdigit44Author Commented:
Not sure if they make a difference (it shouldn't but our vCEnter server is a VM. I have thought about moving it to another host but scared to do so this this is the vcenter server and it would be managing it own migration. I have read this can be done though)

Currently the VM is hosted on a ESXi server that is running 5.0.0 and is one of our older ESXi server
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
All our vCenter Serves are VMs, and they are in large Clusters, with DRS enabled, and vCenter VM moves all over the cluster!

No issues.

Although all our clients are on 5.1 (R&D) and 5.5 and later now.
compdigit44Author Commented:
No granted the vCEnter server has been running farelyy well for a while now. After reading this KB http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2040630 I see the  VirtualCenter.ManagedIP is blank?????
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
should have an IP address in
compdigit44Author Commented:
I agree but how has it been working this long??? Also will adding the address cause any problem i.e: host disconnects, storage etc..

Do services need to be restarted
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
You do not need to put an IP Address in the box, but it's recommended!

Most setup's we've seen, do not have it completed.
compdigit44Author Commented:
Thanks Hancock!!!! I am still looking for the cause of these disconnects.... and why my logging settings seem to reset after the vcenter service is restarted
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
VMware

From novice to tech pro — start learning today.