Issue in Brief: Unable to add node back to the cluster.
Details of the issue: We noticed that one of the nodes in the two node cluster(node & disk majority) is 'down' in FCM. However, we're able to ping it from another node and also RDP into it.
-Tried to evict and re-add the problematic node to the cluster, but unsuccessful in doing so - both via. PowerShell and via. GUI.
-Also rebooted the node, and performed the above operation. Still no luck.
Error: Unable to successfully cleanup.
-Applied "Clear-ClusterNode" and it was successful.
-Then, tried to re-add the node but received the same error, but through GUI & PowerShell.
-Rebooted the problematic node; still getting the same error.
-Upon further investigation, we noticed that the "CLUSDB" was missing in the registry of the problematic node.
-Copied the database to the problematic node (from the other node), and tried to load it in the registry, without success.
-Then copied the "CLUSDB.blf" file to the location, renamed it to "CLUSDB" and tried to add the node again. Still getting the same error.
-Looked into AD, and found the appropriate CNO for the cluster, and VCOs of its corresponding nodes.
-Noticed below error in the cluster logs as part of our investigation.
"New join with n2: stage: 'Authenticate Initial Connection' status HrError(0x80090301) reason: '[SV] Authentication failed'"
-Did a "portquery" on the other node, from the problematic node and it returned with an error.
TCP port 3343(ms-cluster -net service): LISTENING
UDP port 3343(ms-cluster -net service): NOT LISTENING
portqry.exe -n 192.168.5.90 -e 3343 -p BOTH exits with return code 0x00000001
Kindly suggest! It's very critical now that we have it added back to the existing cluster so as to fail them over.
-Also gave full permissions to the CNO, without success.
-Checked "services.msc" and found that the service 'cluster' is in disabled state. Enabled it and started the service, but it failed.
Error: Windows couldn't start the cluster service on Local Computer
-Checked 'System Events' and found the below events at the same time:
Event ID 7024: The Cluster Service terminated with the following service specific error - The system can't find the file specified.
Event ID 1090: The Cluster Service cannot be started. An attempt to read configuration data from the Windows registry failed with error '2'. Please use the Failover Cluster Management snap-in to ensure that this machine is a member of a cluster. If you intend to add this machine to an existing cluster use the Add Node Wizard. Alternatively, if this machine has been configured as a member of a cluster, it will be necessary to restore the missing configuration data that is necessary for the Cluster Service to identify that it is a member of a cluster. Perform a System State Restore of this machine in order to restore the configuration data.
>Checked for Certificates on both the servers but couldn't find cluster-related certificates in either of them.
>Removed the role 'Failover Clustering' from the node 'ndb2012b' and added it back, but no success.