Panos Tsapralis
asked on
VMWARE ESX server appears as disconnected in vSphere client.
During the last few weeks, I have a problem in our VMWARE ESX cluster (Rel.3.5, build 207095): 1 of the 9 servers appears as disconnected in the vSphere client of our vCenter Server (both: Rel.4.0.0, build 258672), while it seems to have a normal operation (the ESX Server console is displayed on the system's screen, it accepts login connections and any Linux command via SSH and it seems to have access to all the VMFS volumes on the shared storage engines). Whenever I reboot that server, it seems to be connected to the cluster ("Connected" state and "Normal" status) for a few seconds (less than 1 minute) and then goes into "Not responding" state. Same thing happens whenever I re-connect that server to the cluster (by right-clicking on its entity in the vSphere client and selecting "Reconnect"). Restarting the VMWARE management services via command-line on the server ("service mgmt-vmware restart" and "service vmware-vpxa restart") did not resolve the problem.
Advice is always welcome - thanks in advance.
Panos Tsapralis,
Athens, GREECE.
Advice is always welcome - thanks in advance.
Panos Tsapralis,
Athens, GREECE.
What version of VMware vSphere are you using?
have you checked the host logs?
have you checked the host logs?
ASKER
The hosts are running ESX Server, Rel.3.5, build 207095, and the Virtual-Center Server and client are at version 4.0.0, build 258672. I have read through the various logs on the disconnected host and couldn't find anything that could seem to be related to my problem. However, when examining the "vpxd" logs on the Virtual Center server, I noticed that there are several messages stating "Marked (hostname) as dirty". I believe that these messages are related to my problem (since they are not generated for any other host of my cluster). I am hereby attaching a small sample of the "vpxd.log" file to this message, so that other people. reading this discussion, have a chance to look at it (in this file, "kronos.phonemarketing.gr" is the hostname of the host in question).
vmware-esx-cluster-vcenter-vpxd-.txt
vmware-esx-cluster-vcenter-vpxd-.txt
Hmmm, any changes in firewall settings? Also check out this KB might be interesting: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1029919
Are the hosts local to vCenter server?
ASKER
Andrew, the ESX hosts are, indeed, local to the vCenter server (I suppose that you mean that the cluster and the vCenter server are communicating through the same LAN) - actually, the vCenter server is itself a VM on one of the ESX hosts (not the disconnected one!...).
spravtek, the Windows firewall is turned off on the vCenter system and I have verified that the vCenter system and the disconnected ESX host are communicating with each other (using "ping" commands in both directions). I am going to carry out the testing procedure, described in the VMWARE KB article that you mentioned and see what comes out (although I do not really believe that the problem is in the UDP/TCP ports of the vCenter system, because - in that case - I would experience the same problem on my other ESX hosts as well, right?...).
spravtek, the Windows firewall is turned off on the vCenter system and I have verified that the vCenter system and the disconnected ESX host are communicating with each other (using "ping" commands in both directions). I am going to carry out the testing procedure, described in the VMWARE KB article that you mentioned and see what comes out (although I do not really believe that the problem is in the UDP/TCP ports of the vCenter system, because - in that case - I would experience the same problem on my other ESX hosts as well, right?...).
ASKER
I have carried out the connectivity testing procedure of VMWARE KB article #1029919 (http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1029919) and I have verified that the disconnected ESX host "speaks" to the Virtual Center server through UDP port "902".
What else is there to look at?...
What else is there to look at?...
TCP 443 and TCP 902
ASKER
Andrew, I have verified that TCP port "443" of the Virtual Center system is accepting connections from the disconnected host (that was expected, anyway, since "https://<vcenterserverhostname>/" is accessible from anywhere within my LAN).
Also, entering "telnet disconnectedesxhostname 902" from a command-line window or opening the page "http://disconnectedesxhostname:902/" in a browser within the Virtual Center server (towards the disconnected ESX host) produces the following message:
"220 VMware Authentication Daemon Version 1.10: SSL Required, ServerDaemonProtocol:SOAP, MKSDisplayProtocol:VNC , "
This the same message that I get when using the same commands towards any other (connected) host in my cluster. Therefore, I assume that TCP port "902" is also open for connections on the troubled host.
Also, entering "telnet disconnectedesxhostname 902" from a command-line window or opening the page "http://disconnectedesxhostname:902/" in a browser within the Virtual Center server (towards the disconnected ESX host) produces the following message:
"220 VMware Authentication Daemon Version 1.10: SSL Required, ServerDaemonProtocol:SOAP,
This the same message that I get when using the same commands towards any other (connected) host in my cluster. Therefore, I assume that TCP port "902" is also open for connections on the troubled host.
do you ever get disconnected from the server?
does vCenter Server ever connect?
does vCenter Server ever connect?
ASKER
When I connect to the troubled host via SSH or the browser, I do not get disconnected (and I feel very confident that the test, that I described earlier - putting a VM on that server - will verify that the host works fine...). When I try to connect from within vCenter Server (right-click on thw disconnected host and select "Connect"), the host stays connected for up to 60 seconds and then returns itself to "Not responding/Alert".
In this Vmware KB article:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1012382#vcenter_4.x
I see that TCP ports "623" and "5989" are also used in the connection between the vCenter system and the ESX hosts. Should I test these ports as well (I'm going to anyway...)?
In this Vmware KB article:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1012382#vcenter_4.x
I see that TCP ports "623" and "5989" are also used in the connection between the vCenter system and the ESX hosts. Should I test these ports as well (I'm going to anyway...)?
ASKER
Nope - neither TCP port "623" nor "5989" are used between the Virtual Center server and any ESX host in my environment...
Management traffic is carried via 443 tcp. (between vCenter and Host)
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
The advice of the article, mentioned in the solution, fits exactly to this issue.
There's also an article of steps to take to troubleshoot disconnects, did you check this? It can be found here
Might be an issue with hostd or vpxa agent though ... The hostd.log is of interest in that case.