Solved

ESXi 4.1 Host Won't Join Cluster

Posted on 2016-10-12
11
77 Views
Last Modified: 2016-10-20
We have 3 ESXi 4.1 hosts in a cluster. (It is due to be upgraded, however it is a validated system, and runs qualified VMs, so we can't just upgrade it to the latest edition without a full project & re-validation).
We moved all the VMs off of 1 host, and rebooted the server (needed to check if there was a memory issue). After it was rebooted it appears that it won't connect to the cluster again. If we try to re-add the host, we get an error message:
Call "Datacenter.QueryConnectionInfo" for object "CLUSTER1" on vCenter Server "vCentre.domain.net" failed.

We've tried rebooting again, also checked the VMWare vCenter Agent service is running. But that hasn't helped. I've checked the vpxd.log it seems authd is failing.
I've tried adding
security.host.ruissl = "TRUE"
to /etc/vmware/config file as per VM KB2037351

 Not sure what to try next :
[2016-10-12 16:12:56.975 04720 info 'App' opID=B57F0F0C-000000D3] [VpxLRO] -- BEGIN task-internal-1698 --  -- vmodl.query.PropertyCollector.cancelWaitForUpdates -- A576837F-4FED-4164-ADFB-B2A3CE7D2574(48CCDE4A-41A4-4721-B330-B6043FDBE81A)
[2016-10-12 16:12:56.975 04720 verbose 'App' opID=B57F0F0C-000000D3] [VpxVmomi] Invoke error: vmodl.query.PropertyCollector.waitForUpdates session: A576837F-4FED-4164-ADFB-B2A3CE7D2574 Throw: vmodl.fault.RequestCanceled
[2016-10-12 16:12:56.976 04720 verbose 'SoapAdapter.HTTPService' opID=B57F0F0C-000000D3] User agent is 'VMware VI Client/4.0.0'
[2016-10-12 16:12:56.976 04720 verbose 'SoapAdapter.HTTPService' opID=B57F0F0C-000000D3] HTTP Response: Client: NeedsContentLength: false UnderstandsChunking: true CanKeepAlive: true (PresetContentLength -1)
[2016-10-12 16:12:56.976 04720 verbose 'SoapAdapter.HTTPService' opID=B57F0F0C-000000D3] HTTP Response: Complete (processed 570 bytes)
[2016-10-12 16:12:56.977 04720 error 'App' opID=B57F0F0C-000000D3] Connection lost while waiting for the next request on stream TCPStreamWin32(socket=TCP(fd=2420) local=[::1]:8085,  peer=[::1]:52681): class Vmacore::SystemException(An established connection was aborted by the software in your host machine. )
[2016-10-12 16:12:56.977 04720 verbose 'App' opID=B57F0F0C-000000D3] [VpxVmomi] Invoke done: vmodl.query.PropertyCollector.cancelWaitForUpdates session: A576837F-4FED-4164-ADFB-B2A3CE7D2574
[2016-10-12 16:12:56.977 04720 verbose 'SoapAdapter.HTTPService' opID=B57F0F0C-000000D3] HTTP Response: Complete (processed 414 bytes)
[2016-10-12 16:12:56.978 04720 info 'App' opID=B57F0F0C-000000D3] [VpxLRO] -- FINISH task-internal-1698 --  -- vmodl.query.PropertyCollector.cancelWaitForUpdates -- A576837F-4FED-4164-ADFB-B2A3CE7D2574(48CCDE4A-41A4-4721-B330-B6043FDBE81A)
[2016-10-12 16:12:57.209 04720 verbose 'ProxySvc Req00861'] New client SSL(TCPStreamWin32(socket=TCP(fd=2132) local=10.20.2.78:443,  peer=10.32.30.54:54078))
[2016-10-12 16:12:57.326 04720 verbose 'SoapAdapter.HTTPService'] User agent is 'VMware VI Client/4.0.0'
[2016-10-12 16:12:57.326 04720 verbose 'SoapAdapter.HTTPService'] HTTP Response: Client: NeedsContentLength: false UnderstandsChunking: true CanKeepAlive: true (PresetContentLength -1)
[2016-10-12 16:12:57.327 04720 verbose 'App'] [VpxVmomi] Invoking [waitForUpdates] on [vmodl.query.PropertyCollector:session[A576837F-4FED-4164-ADFB-B2A3CE7D2574]6ECC220B-329A-4CE0-8409-5206AE198E4C] session [A576837F-4FED-4164-ADFB-B2A3CE7D2574(48CCDE4A-41A4-4721-B330-B6043FDBE81A)]
[2016-10-12 16:12:57.722 03948 info 'Libs' opID=500C55A7-000001BB] Cannot connect to server esxhost2:902: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
[2016-10-12 16:12:57.722 03948 info 'Libs' opID=500C55A7-000001BB] CnxAuthdConnect: Returning false because CnxAuthdConnectTCP failed
[2016-10-12 16:12:57.722 03948 info 'Libs' opID=500C55A7-000001BB] CnxConnectAuthd: Returning false because CnxAuthdConnect failed
[2016-10-12 16:12:57.722 03948 info 'Libs' opID=500C55A7-000001BB] Cnx_Connect: Returning false because CnxConnectAuthd failed
[2016-10-12 16:12:57.722 03948 info 'Libs' opID=500C55A7-000001BB] Cnx_Connect: Error message: Failed to connect to server esxhost2:902
[2016-10-12 16:12:57.722 03948 error 'App' opID=500C55A7-000001BB] [VpxVmdbCnx] Authd error: Failed to connect to server esxhost2:902
[2016-10-12 16:12:57.722 03948 error 'App' opID=500C55A7-000001BB] [VpxVmdbCnx] Failed to connect to host esxhost2:902. Check that authd is running correctly (lib/connect error 2)
[2016-10-12 16:12:57.722 03948 verbose 'App' opID=500C55A7-000001BB] [VpxVmdbCnx] Failed to connect to host <esxhost2>
[2016-10-12 16:12:57.722 03948 verbose 'App' opID=500C55A7-000001BB] [VpxdHostAccess] Disconnecting from esxhost2
[2016-10-12 16:12:57.722 03948 error 'App' opID=500C55A7-000001BB] vim.fault.NoHost
0
Comment
Question by:bjblackmore
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
11 Comments
 
LVL 1

Expert Comment

by:ukitsme
ID: 41840909
I believe you haven't upgraded host yet.
if that is the case please check the following:
1) Check permissions of the user account that you are using
2) Ensure that the version of vCenter Server is same or higher than the ESXi host being added.
3) Verify you are able to connect to the ESXi host using the vSphere Client.
0
 

Author Comment

by:bjblackmore
ID: 41841011
Thanks for the reply.

We haven't changed anything, and don't plan to upgrade the OS (If we move to ESXi 5.1 or ESXI 6 it'll be on brand new hardware).
Nothing has changed on this host, as far as we know, from when it was a working member of the cluster, except it was rebooted. For some reason it's lost connection with the cluster since the reboot, and won't add back in.

The account being used is root, the version is the same as the other 2 hosts in the cluster (esxhost1 & esxhost3). I can connect and manage the host directly with vSphere client, using the same root account.
0
 
LVL 62

Expert Comment

by:gheist
ID: 41841508
If shellshock is validated, best option is to disconnect your validated setup from the internet to reduce global grief.
0
U.S. Department of Agriculture and Acronis Access

With the new era of mobile computing, smartphones and tablets, wireless communications and cloud services, the USDA sought to take advantage of a mobilized workforce and the blurring lines between personal and corporate computing resources.

 

Author Comment

by:bjblackmore
ID: 41842011
I'm unsure what you mean? What is shellshock? Also not sure what you mean by disconnect the validated setup from the internet?

By validated I mean the hardware is documented, the software & install/config settings are documented. We perform a number of tests.scripts once built. When successful, we get the documentation signed off by someone from Quality, then no further changes can be made to the hardware/software/config without a change control.

This cluster is 3 ESXi hosts sat in a data centre, with a vCenter server managing them. There is no direct internet connection.
0
 
LVL 62

Expert Comment

by:gheist
ID: 41842612
WOuld be easiest to go ahead with valid update. You can ask vmware support to assist you during working hours.
0
 

Author Comment

by:bjblackmore
ID: 41842737
We're not going to be able to update. That would be a full project, and need allocated resources, time, money, licenses, which we don't have at the moment.
As this is a production environment, we just need to get the 3rd host up and running in the cluster as quickly as possible.
0
 
LVL 62

Expert Comment

by:gheist
ID: 41842838
If you dont have money dont use vmware....
Can you connect new ESXi using desktop client and web browser? Is in in 4.1 HCL?
0
 
LVL 20

Accepted Solution

by:
compdigit44 earned 500 total points
ID: 41845905
I noticed the follow error message in the log output you posted : CnxAuthdConnect: Returning false because CnxAuthdConnectTCP failed

I know you mentioned that nothing has changed but have you tried the following KB https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1010837

If this does not work your could try to repair the ESXi install:  http://pubs.vmware.com/vsphere-4-esxi-installable-vcenter/index.jsp?topic=/com.vmware.vsphere.setupinstallable.doc_41/install/backing_up_and_restoring_esxi_4.0/t_recover_the_esxi_4.0_installable_software.html
0
 

Author Comment

by:bjblackmore
ID: 41852032
I tried performing the changes as mentioned in the KB, but it didn't help.

In the end I decided it was just going to be easier & quicker to repair/re-install ESXi. I tried a repair install first, but that didn't help. I was still getting the same error. So I ended up doing a clean install. Within an hour, it was installed, configured, and I was able to add it back into the cluster without any further error messages. Not sure why or what became corrupt!
0
 

Author Closing Comment

by:bjblackmore
ID: 41852033
Performing a re-install was quickest & cleanest option.
0
 
LVL 62

Expert Comment

by:gheist
ID: 41852684
You mistyped own host name. In 6.0 you can change that after installation...
0

Featured Post

Automating Your MSP Business

The road to profitability.
Delivering superior services is key to ensuring customer satisfaction and the consequent long-term relationships that enable MSPs to lock in predictable, recurring revenue. What's the best way to deliver superior service? One word: automation.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

In this article, I show you step by step with screenshots to assist you - HOW TO: Deploy and Install the VMware vCenter Server Appliance 6.5 (VCSA 6.5), with some helpful tips along the way.
In this article, I am going to show you how to simulate a multi-site Lab environment on a single Hyper-V host. I use this method successfully in my own lab to simulate three fully routed global AD Sites on a Windows 10 Hyper-V host.
How to install and configure Citrix XenApp 6.5 - Part 1. In this video tutorial we have explained step by step installation of Citrix XenApp 6.5 Server on Windows Server 2008 R2 is explained in this video. We have explained the difference between…
How to Install VMware Tools in Red Hat Enterprise Linux 6.4 (RHEL 6.4) Step-by-Step Tutorial

751 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question