• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 6028
  • Last Modified:

VMWare ESXi Host Disconnects from Vcenter

Afternoon All,

My organization is having an intermittent problem with our ESXi 5.0 servers.  Last week we witnessed our three ESXi hosts not respond and then disconnect from vcenter.  After going through the KB article from VMWare on what to do, we could only complete a hard restart of the servers (we know now the best practice at all).  

Today, I witnessed another server reach the not responding state and then disconnect from vcenter.  We never had a purple screen on the console.  This time, I was not able to F12 into the screen and attempt to issue a restart command.  Last time I was able to; however, it just hung for almost and hour and we were required to do a hard restart.  

I was hoping someone could give me suggestions as to where I should look for problems or has anyone had this problem/encountered this problem?  It's quite frustrating because I'm basically always anticipating some type of severe issue every couple of days.  Also, every time this happened, I've still been able to access the servers running- i.e. Exchange.

Thanks in advance,
0
Anthony6890
Asked:
Anthony6890
  • 15
  • 11
  • +1
1 Solution
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
1. Check server hardware is compatible and certified to run ESXi 5.0.

2. Has this just started to occur?

3. Are you on the latest patch release of 5.0.0 U3 plus patch 12 (e.g. latest fixes from VMware for ESXi 5.0).

4. Is this a LAN or WAN connection?

5. Again, the same with vCenter, is this the latest version of 5.0?

6. What is the overall use of Host resources, e.g. CPU and Memory, are we at 100%?

7. Any network changes, what is network topology?

8. Is an iSCSI SAN attached? if so what SAN?

9. Can the ESXi servers be "pinged" when they disconnect? from vCenter Server?

10. Can you ping from ESXi server to vCenter, when they disconnect?

11. Can you re-connect the servers?

12. Have you tried Restarting Network Management Agents on each host, when disconnected?

There are lots of questions, there to help us, diagnose this issue.

(it's quite common!)
0
 
Anthony6890Author Commented:
Thanks for responding back, here are the answers:

1.  Yes we have checked all equipment for esxi5 compatibility.

2.  Yes, this issue just started last week.  We went 3 days before we observed the similar issue, just not as severe since it only happened to one host and not all 3.

3.  I will review the latest patches. I don't know if we are at the latest.  

4.  This is a LAN connection.

5.  I do know Vcenter is the latest version.  

6.  For CPU usage, we are very low on under, 3%, for memory we are between 50-60% used on each server. There is also plenty of space. The smallest amount of space available is 400Gb.

7.  No recent network changes.  We installed a new firewall about a month ago. Also, we are simulating a 100mbp bandwidth for wan simulation. That has been in place for about 3 weeks now, with no issues. We only observed a max of 50mbp at our high.

8.  Yes, we have 2sans. They are both IBM SD2350

9.  Yes the servers can be pinged, but I have not tried from the Vcenter server.  

10.  I have not tried pinging from esxi server to vcenter.

11.  We cannot reconnect from vcenter   Each time we had to do a hard restart to get them to reconnect.

12.  Yes, we tried restarting the agents, but it was unsuccessful.
0
 
AbhilashCommented:
Can you check the logs files or upload them here so we can see what's making this happen?
0
Veeam Disaster Recovery in Microsoft Azure

Veeam PN for Microsoft Azure is a FREE solution designed to simplify and automate the setup of a DR site in Microsoft Azure using lightweight software-defined networking. It reduces the complexity of VPN deployments and is designed for businesses of ALL sizes.

 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
check the /var/log/vmkernel.log
0
 
Anthony6890Author Commented:
Morning guys, here is the log for the server that we observed this morning that was disconnected.  

Our consultants required a hard restart of this.
0
 
Anthony6890Author Commented:
Here is the log for the server that disconnected yesterday.
vmkernelforESXi3.txt
0
 
Anthony6890Author Commented:
I forgot to attach the log for the server that disconnected this morning.  It's attached here.
ESXI1-vmkernel.log
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
I would certainly test, and use ping -t, to keep continuous pings between ESXi and vCenter Servers, to rule out any communication issues, which also, highlights just checking no strange firewall rules causing this issue.

I'll look at logs...
0
 
Anthony6890Author Commented:
Will do Andrew.  Thanks for the suggestion.
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Is you SAN iSCSI, and are you having any path issues, or have you re moved any LUNs recently, without un-mounting the the LUNs or Masking the LUNs before removal?

The reason I mention this, is iSCSI does have a bug, where if you just remove LUNs, the ESXi server, can go into a "loop" waiting and polling for the LUNs to come back, and starts to become un-repsonsive, disconnects from vCenter Server.

and just quickly looking at the logs, I can see some datastore, volume, path issues.

what's up with this LUN

 "naa.60080e50002483ee0000031d4f157b20" on path "vmhba38:C3:T0:L1" Failed:

WARNING: VMW_SATP_LSI: satp_lsi_pathIsUsingPreferredController:714:Failed to get volume access control data for path "vmhba38:C2:T0:L1": Failure

VMW_SATP_LSI: satp_lsi_updatePath:680: Failed to update path "vmhba38:C2:T0:L1" Failure
0
 
Anthony6890Author Commented:
Yes, the SAN iSCSI.  How would I know if I am having any path issues?  We havne't removed any LUN's recently, but I can check with our consultants to see if they did anything.
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Any datastores, that are disconnecting?

do you have a datastore called VDI?
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
check under Host Server > Configuration > Storage Adaptors > iSCSI Software ... > Paths

Are all paths Active, none Dead ?
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Can you upload vCenter Logs?
0
 
Anthony6890Author Commented:
Hi guys, sorry for the delay.  Have been putting out some various fires.  

Yes, I do have a datastore called VDI.

When I go to the Configuration for the three Hosts, all paths are active, only some are in Stand by.  

Yes for the vcenter Logs, I will get them for you.  

Again, thank you again.
0
 
Anthony6890Author Commented:
Andrew, which logs would you like, all of them?
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
vpxd.logs
0
 
Anthony6890Author Commented:
Ok, give me a couple of minutes to get all the logs populated.
0
 
Anthony6890Author Commented:
Here were the most recent logs that I found.
vpxd-199.log
0
 
Anthony6890Author Commented:
Here is the most recent log.
vpxd-200.log
0
 
Anthony6890Author Commented:
Guys, I just got off the phone with VMWare who also reviewed some of the logs by logging in via PuTTY.  They have determined that it is a storage related issue and not a network related issue.  Our SAN's are IBM DS3512 model's so we will be communicating with them.  

I'll keep you posted on the more detailed underlying issue.
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
That's what I wrote in http:#a39850257

The reason I mention this, is iSCSI does have a bug, where if you just remove LUNs, the ESXi server, can go into a "loop" waiting and polling for the LUNs to come back, and starts to become un-repsonsive, disconnects from vCenter Server.

and the VDI datastore, seems to crop up, in the logs!

What version of ESXi 5.0, build are you using, these issues were supposed to have been resolved in U3.

But, I've seen several issues like this in 5.1 and 5.5, when a LUN hangs..... and ESXi starts polling....
0
 
Anthony6890Author Commented:
Morning Andrew,

Sorry, yes I do know you said that that the datastore could be the issue, I just needed more confirmation with the log review.

For ESX information we are furnning ESXi, 5.0.0, 474610
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
That's quite an early version of ESXi 5.0.

The latest build is U3, 1489271.

It might be worth a test of upgrading.
0
 
Anthony6890Author Commented:
We actually just had another server get disconnected from vcenter.  VMWare was available to review the logs and again solidified that it is a storage issue.  They informed us to contact the SAN vendor, IBM, to investigate further.

Thank you for your help with this.
0
 
Anthony6890Author Commented:
Was spot on with the issue, we are reaching out to the storage vendor for more information as to why we are having issues with the SAN.
0
 
BitTrekkerCommented:
What was your fix for this?
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
@BitTrekker:- Post a question, and myself or fellow VMware Experts, can assist you with your problem.
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 15
  • 11
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now