bjblackmore
asked on
ESXi 4.1 Host Won't Join Cluster
We have 3 ESXi 4.1 hosts in a cluster. (It is due to be upgraded, however it is a validated system, and runs qualified VMs, so we can't just upgrade it to the latest edition without a full project & re-validation).
We moved all the VMs off of 1 host, and rebooted the server (needed to check if there was a memory issue). After it was rebooted it appears that it won't connect to the cluster again. If we try to re-add the host, we get an error message:
We've tried rebooting again, also checked the VMWare vCenter Agent service is running. But that hasn't helped. I've checked the vpxd.log it seems authd is failing.
I've tried adding
Not sure what to try next :
We moved all the VMs off of 1 host, and rebooted the server (needed to check if there was a memory issue). After it was rebooted it appears that it won't connect to the cluster again. If we try to re-add the host, we get an error message:
Call "Datacenter.QueryConnectionInfo" for object "CLUSTER1" on vCenter Server "vCentre.domain.net" failed.
We've tried rebooting again, also checked the VMWare vCenter Agent service is running. But that hasn't helped. I've checked the vpxd.log it seems authd is failing.
I've tried adding
security.host.ruissl = "TRUE"to /etc/vmware/config file as per VM KB2037351
Not sure what to try next :
[2016-10-12 16:12:56.975 04720 info 'App' opID=B57F0F0C-000000D3] [VpxLRO] -- BEGIN task-internal-1698 -- -- vmodl.query.PropertyCollector.cancel WaitForUpd ates -- A576837F-4FED-4164-ADFB-B2 A3CE7D2574 (48CCDE4A- 41A4-4721- B330-B6043 FDBE81A)
[2016-10-12 16:12:56.975 04720 verbose 'App' opID=B57F0F0C-000000D3] [VpxVmomi] Invoke error: vmodl.query.PropertyCollector.waitFo rUpdates session: A576837F-4FED-4164-ADFB-B2 A3CE7D2574 Throw: vmodl.fault.RequestCancele d
[2016-10-12 16:12:56.976 04720 verbose 'SoapAdapter.HTTPService' opID=B57F0F0C-000000D3] User agent is 'VMware VI Client/4.0.0'
[2016-10-12 16:12:56.976 04720 verbose 'SoapAdapter.HTTPService' opID=B57F0F0C-000000D3] HTTP Response: Client: NeedsContentLength: false UnderstandsChunking: true CanKeepAlive: true (PresetContentLength -1)
[2016-10-12 16:12:56.976 04720 verbose 'SoapAdapter.HTTPService' opID=B57F0F0C-000000D3] HTTP Response: Complete (processed 570 bytes)
[2016-10-12 16:12:56.977 04720 error 'App' opID=B57F0F0C-000000D3] Connection lost while waiting for the next request on stream TCPStreamWin32(socket=TCP(fd=2420) local=[::1]:8085, peer=[::1]:52681): class Vmacore::SystemException(A n established connection was aborted by the software in your host machine. )
[2016-10-12 16:12:56.977 04720 verbose 'App' opID=B57F0F0C-000000D3] [VpxVmomi] Invoke done: vmodl.query.PropertyCollector.cancel WaitForUpd ates session: A576837F-4FED-4164-ADFB-B2 A3CE7D2574
[2016-10-12 16:12:56.977 04720 verbose 'SoapAdapter.HTTPService' opID=B57F0F0C-000000D3] HTTP Response: Complete (processed 414 bytes)
[2016-10-12 16:12:56.978 04720 info 'App' opID=B57F0F0C-000000D3] [VpxLRO] -- FINISH task-internal-1698 -- -- vmodl.query.PropertyCollector.cancel WaitForUpd ates -- A576837F-4FED-4164-ADFB-B2 A3CE7D2574 (48CCDE4A- 41A4-4721- B330-B6043 FDBE81A)
[2016-10-12 16:12:57.209 04720 verbose 'ProxySvc Req00861'] New client SSL(TCPStreamWin32(socket=TCP(fd=213 2) local=10.20.2.78:443, peer=10.32.30.54:54078))
[2016-10-12 16:12:57.326 04720 verbose 'SoapAdapter.HTTPService']User agent is 'VMware VI Client/4.0.0'
[2016-10-12 16:12:57.326 04720 verbose 'SoapAdapter.HTTPService']HTTP Response: Client: NeedsContentLength: false UnderstandsChunking: true CanKeepAlive: true (PresetContentLength -1)
[2016-10-12 16:12:57.327 04720 verbose 'App'] [VpxVmomi] Invoking [waitForUpdates] on [vmodl.query.PropertyCollector:sessi on[A576837 F-4FED-416 4-ADFB-B2A 3CE7D2574] 6ECC220B-3 29A-4CE0-8 409-5206AE 198E4C] session [A576837F-4FED-4164-ADFB-B 2A3CE7D257 4(48CCDE4A -41A4-4721 -B330-B604 3FDBE81A)]
[2016-10-12 16:12:57.722 03948 info 'Libs' opID=500C55A7-000001BB] Cannot connect to server esxhost2:902: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
[2016-10-12 16:12:57.722 03948 info 'Libs' opID=500C55A7-000001BB] CnxAuthdConnect: Returning false because CnxAuthdConnectTCP failed
[2016-10-12 16:12:57.722 03948 info 'Libs' opID=500C55A7-000001BB] CnxConnectAuthd: Returning false because CnxAuthdConnect failed
[2016-10-12 16:12:57.722 03948 info 'Libs' opID=500C55A7-000001BB] Cnx_Connect: Returning false because CnxConnectAuthd failed
[2016-10-12 16:12:57.722 03948 info 'Libs' opID=500C55A7-000001BB] Cnx_Connect: Error message: Failed to connect to server esxhost2:902
[2016-10-12 16:12:57.722 03948 error 'App' opID=500C55A7-000001BB] [VpxVmdbCnx] Authd error: Failed to connect to server esxhost2:902
[2016-10-12 16:12:57.722 03948 error 'App' opID=500C55A7-000001BB] [VpxVmdbCnx] Failed to connect to host esxhost2:902. Check that authd is running correctly (lib/connect error 2)
[2016-10-12 16:12:57.722 03948 verbose 'App' opID=500C55A7-000001BB] [VpxVmdbCnx] Failed to connect to host <esxhost2>
[2016-10-12 16:12:57.722 03948 verbose 'App' opID=500C55A7-000001BB] [VpxdHostAccess] Disconnecting from esxhost2
[2016-10-12 16:12:57.722 03948 error 'App' opID=500C55A7-000001BB] vim.fault.NoHost
ASKER
Thanks for the reply.
We haven't changed anything, and don't plan to upgrade the OS (If we move to ESXi 5.1 or ESXI 6 it'll be on brand new hardware).
Nothing has changed on this host, as far as we know, from when it was a working member of the cluster, except it was rebooted. For some reason it's lost connection with the cluster since the reboot, and won't add back in.
The account being used is root, the version is the same as the other 2 hosts in the cluster (esxhost1 & esxhost3). I can connect and manage the host directly with vSphere client, using the same root account.
We haven't changed anything, and don't plan to upgrade the OS (If we move to ESXi 5.1 or ESXI 6 it'll be on brand new hardware).
Nothing has changed on this host, as far as we know, from when it was a working member of the cluster, except it was rebooted. For some reason it's lost connection with the cluster since the reboot, and won't add back in.
The account being used is root, the version is the same as the other 2 hosts in the cluster (esxhost1 & esxhost3). I can connect and manage the host directly with vSphere client, using the same root account.
If shellshock is validated, best option is to disconnect your validated setup from the internet to reduce global grief.
ASKER
I'm unsure what you mean? What is shellshock? Also not sure what you mean by disconnect the validated setup from the internet?
By validated I mean the hardware is documented, the software & install/config settings are documented. We perform a number of tests.scripts once built. When successful, we get the documentation signed off by someone from Quality, then no further changes can be made to the hardware/software/config without a change control.
This cluster is 3 ESXi hosts sat in a data centre, with a vCenter server managing them. There is no direct internet connection.
By validated I mean the hardware is documented, the software & install/config settings are documented. We perform a number of tests.scripts once built. When successful, we get the documentation signed off by someone from Quality, then no further changes can be made to the hardware/software/config without a change control.
This cluster is 3 ESXi hosts sat in a data centre, with a vCenter server managing them. There is no direct internet connection.
WOuld be easiest to go ahead with valid update. You can ask vmware support to assist you during working hours.
ASKER
We're not going to be able to update. That would be a full project, and need allocated resources, time, money, licenses, which we don't have at the moment.
As this is a production environment, we just need to get the 3rd host up and running in the cluster as quickly as possible.
As this is a production environment, we just need to get the 3rd host up and running in the cluster as quickly as possible.
If you dont have money dont use vmware....
Can you connect new ESXi using desktop client and web browser? Is in in 4.1 HCL?
Can you connect new ESXi using desktop client and web browser? Is in in 4.1 HCL?
ASKER CERTIFIED SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
ASKER
I tried performing the changes as mentioned in the KB, but it didn't help.
In the end I decided it was just going to be easier & quicker to repair/re-install ESXi. I tried a repair install first, but that didn't help. I was still getting the same error. So I ended up doing a clean install. Within an hour, it was installed, configured, and I was able to add it back into the cluster without any further error messages. Not sure why or what became corrupt!
In the end I decided it was just going to be easier & quicker to repair/re-install ESXi. I tried a repair install first, but that didn't help. I was still getting the same error. So I ended up doing a clean install. Within an hour, it was installed, configured, and I was able to add it back into the cluster without any further error messages. Not sure why or what became corrupt!
ASKER
Performing a re-install was quickest & cleanest option.
You mistyped own host name. In 6.0 you can change that after installation...
if that is the case please check the following:
1) Check permissions of the user account that you are using
2) Ensure that the version of vCenter Server is same or higher than the ESXi host being added.
3) Verify you are able to connect to the ESXi host using the vSphere Client.