Link to home
Create AccountLog in
Avatar of bjblackmore
bjblackmore

asked on

ESXi 4.1 Host Won't Join Cluster

We have 3 ESXi 4.1 hosts in a cluster. (It is due to be upgraded, however it is a validated system, and runs qualified VMs, so we can't just upgrade it to the latest edition without a full project & re-validation).
We moved all the VMs off of 1 host, and rebooted the server (needed to check if there was a memory issue). After it was rebooted it appears that it won't connect to the cluster again. If we try to re-add the host, we get an error message:
Call "Datacenter.QueryConnectionInfo" for object "CLUSTER1" on vCenter Server "vCentre.domain.net" failed.

We've tried rebooting again, also checked the VMWare vCenter Agent service is running. But that hasn't helped. I've checked the vpxd.log it seems authd is failing.
I've tried adding
security.host.ruissl = "TRUE"
to /etc/vmware/config file as per VM KB2037351

 Not sure what to try next :
[2016-10-12 16:12:56.975 04720 info 'App' opID=B57F0F0C-000000D3] [VpxLRO] -- BEGIN task-internal-1698 --  -- vmodl.query.PropertyCollector.cancelWaitForUpdates -- A576837F-4FED-4164-ADFB-B2A3CE7D2574(48CCDE4A-41A4-4721-B330-B6043FDBE81A)
[2016-10-12 16:12:56.975 04720 verbose 'App' opID=B57F0F0C-000000D3] [VpxVmomi] Invoke error: vmodl.query.PropertyCollector.waitForUpdates session: A576837F-4FED-4164-ADFB-B2A3CE7D2574 Throw: vmodl.fault.RequestCanceled
[2016-10-12 16:12:56.976 04720 verbose 'SoapAdapter.HTTPService' opID=B57F0F0C-000000D3] User agent is 'VMware VI Client/4.0.0'
[2016-10-12 16:12:56.976 04720 verbose 'SoapAdapter.HTTPService' opID=B57F0F0C-000000D3] HTTP Response: Client: NeedsContentLength: false UnderstandsChunking: true CanKeepAlive: true (PresetContentLength -1)
[2016-10-12 16:12:56.976 04720 verbose 'SoapAdapter.HTTPService' opID=B57F0F0C-000000D3] HTTP Response: Complete (processed 570 bytes)
[2016-10-12 16:12:56.977 04720 error 'App' opID=B57F0F0C-000000D3] Connection lost while waiting for the next request on stream TCPStreamWin32(socket=TCP(fd=2420) local=[::1]:8085,  peer=[::1]:52681): class Vmacore::SystemException(An established connection was aborted by the software in your host machine. )
[2016-10-12 16:12:56.977 04720 verbose 'App' opID=B57F0F0C-000000D3] [VpxVmomi] Invoke done: vmodl.query.PropertyCollector.cancelWaitForUpdates session: A576837F-4FED-4164-ADFB-B2A3CE7D2574
[2016-10-12 16:12:56.977 04720 verbose 'SoapAdapter.HTTPService' opID=B57F0F0C-000000D3] HTTP Response: Complete (processed 414 bytes)
[2016-10-12 16:12:56.978 04720 info 'App' opID=B57F0F0C-000000D3] [VpxLRO] -- FINISH task-internal-1698 --  -- vmodl.query.PropertyCollector.cancelWaitForUpdates -- A576837F-4FED-4164-ADFB-B2A3CE7D2574(48CCDE4A-41A4-4721-B330-B6043FDBE81A)
[2016-10-12 16:12:57.209 04720 verbose 'ProxySvc Req00861'] New client SSL(TCPStreamWin32(socket=TCP(fd=2132) local=10.20.2.78:443,  peer=10.32.30.54:54078))
[2016-10-12 16:12:57.326 04720 verbose 'SoapAdapter.HTTPService'] User agent is 'VMware VI Client/4.0.0'
[2016-10-12 16:12:57.326 04720 verbose 'SoapAdapter.HTTPService'] HTTP Response: Client: NeedsContentLength: false UnderstandsChunking: true CanKeepAlive: true (PresetContentLength -1)
[2016-10-12 16:12:57.327 04720 verbose 'App'] [VpxVmomi] Invoking [waitForUpdates] on [vmodl.query.PropertyCollector:session[A576837F-4FED-4164-ADFB-B2A3CE7D2574]6ECC220B-329A-4CE0-8409-5206AE198E4C] session [A576837F-4FED-4164-ADFB-B2A3CE7D2574(48CCDE4A-41A4-4721-B330-B6043FDBE81A)]
[2016-10-12 16:12:57.722 03948 info 'Libs' opID=500C55A7-000001BB] Cannot connect to server esxhost2:902: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
[2016-10-12 16:12:57.722 03948 info 'Libs' opID=500C55A7-000001BB] CnxAuthdConnect: Returning false because CnxAuthdConnectTCP failed
[2016-10-12 16:12:57.722 03948 info 'Libs' opID=500C55A7-000001BB] CnxConnectAuthd: Returning false because CnxAuthdConnect failed
[2016-10-12 16:12:57.722 03948 info 'Libs' opID=500C55A7-000001BB] Cnx_Connect: Returning false because CnxConnectAuthd failed
[2016-10-12 16:12:57.722 03948 info 'Libs' opID=500C55A7-000001BB] Cnx_Connect: Error message: Failed to connect to server esxhost2:902
[2016-10-12 16:12:57.722 03948 error 'App' opID=500C55A7-000001BB] [VpxVmdbCnx] Authd error: Failed to connect to server esxhost2:902
[2016-10-12 16:12:57.722 03948 error 'App' opID=500C55A7-000001BB] [VpxVmdbCnx] Failed to connect to host esxhost2:902. Check that authd is running correctly (lib/connect error 2)
[2016-10-12 16:12:57.722 03948 verbose 'App' opID=500C55A7-000001BB] [VpxVmdbCnx] Failed to connect to host <esxhost2>
[2016-10-12 16:12:57.722 03948 verbose 'App' opID=500C55A7-000001BB] [VpxdHostAccess] Disconnecting from esxhost2
[2016-10-12 16:12:57.722 03948 error 'App' opID=500C55A7-000001BB] vim.fault.NoHost
Avatar of ukitsme
ukitsme

I believe you haven't upgraded host yet.
if that is the case please check the following:
1) Check permissions of the user account that you are using
2) Ensure that the version of vCenter Server is same or higher than the ESXi host being added.
3) Verify you are able to connect to the ESXi host using the vSphere Client.
Avatar of bjblackmore

ASKER

Thanks for the reply.

We haven't changed anything, and don't plan to upgrade the OS (If we move to ESXi 5.1 or ESXI 6 it'll be on brand new hardware).
Nothing has changed on this host, as far as we know, from when it was a working member of the cluster, except it was rebooted. For some reason it's lost connection with the cluster since the reboot, and won't add back in.

The account being used is root, the version is the same as the other 2 hosts in the cluster (esxhost1 & esxhost3). I can connect and manage the host directly with vSphere client, using the same root account.
If shellshock is validated, best option is to disconnect your validated setup from the internet to reduce global grief.
I'm unsure what you mean? What is shellshock? Also not sure what you mean by disconnect the validated setup from the internet?

By validated I mean the hardware is documented, the software & install/config settings are documented. We perform a number of tests.scripts once built. When successful, we get the documentation signed off by someone from Quality, then no further changes can be made to the hardware/software/config without a change control.

This cluster is 3 ESXi hosts sat in a data centre, with a vCenter server managing them. There is no direct internet connection.
WOuld be easiest to go ahead with valid update. You can ask vmware support to assist you during working hours.
We're not going to be able to update. That would be a full project, and need allocated resources, time, money, licenses, which we don't have at the moment.
As this is a production environment, we just need to get the 3rd host up and running in the cluster as quickly as possible.
If you dont have money dont use vmware....
Can you connect new ESXi using desktop client and web browser? Is in in 4.1 HCL?
ASKER CERTIFIED SOLUTION
Avatar of compdigit44
compdigit44

Link to home
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
See answer
I tried performing the changes as mentioned in the KB, but it didn't help.

In the end I decided it was just going to be easier & quicker to repair/re-install ESXi. I tried a repair install first, but that didn't help. I was still getting the same error. So I ended up doing a clean install. Within an hour, it was installed, configured, and I was able to add it back into the cluster without any further error messages. Not sure why or what became corrupt!
Performing a re-install was quickest & cleanest option.
You mistyped own host name. In 6.0 you can change that after installation...