We help IT Professionals succeed at work.

MS Failover cluster Troubleshooting

Medium Priority
207 Views
Last Modified: 2018-05-07
Hi expert

What is the best practise to troubleshoot failover cluster
Is there any scripts out there that I can run to see why it has failed over.

List of checks I can check to pin point the failover

Thanks in advance for your help
Comment
Watch Question

CERTIFIED EXPERT
Distinguished Expert 2019

Commented:
The event log will reflect some info on the transition.
Having monitoring of nodes, and the cluster.....
Access to storage, nics, heartbeat.


What kind of cluster?

Author

Commented:
Two node cluster
CERTIFIED EXPERT
Distinguished Expert 2019

Commented:
Two node which application? Active/active setup or active/Passover?
The former active node's eventlog shoukd have the event notice about the transition.

Author

Commented:
So if it is active /active or active /passive if I look at the logs of active that should give me that answers

Author

Commented:
I am new to this there is also an active/passive/ witness

Would that be the same check the active logs for the answer for a fail over
65tdRetired
CERTIFIED EXPERT

Commented:
Both nodes in either cluster write to the cluster log.
What is the OS version?
65tdRetired
CERTIFIED EXPERT

Commented:
Can use powershell commands to retrieve cluster log information see link:
https://docs.microsoft.com/en-us/powershell/module/failoverclusters/get-clusterlog?view=win10-ps

Author

Commented:
Server 2012 and we are also using 2008

Author

Commented:
Yes I have used power shell commands
CERTIFIED EXPERT
Distinguished Expert 2019

Commented:
active/active/witness you might be referring to SQL mirroring and not clustering.

open cluster admin, and see what you have there.
active/active means you have two clustered Database servers and each runs on the one node, but the resources are such should one node fail, the other will handle both.

presumably 2008 refers to the SQL server 2008 which is the clustered application?

Author

Commented:
Yes that is what we have just had a look

If it fails over which it has what are the steps I will need to take to pin point the fail over



Here is the error message I get in cluster manger

 Note :
Cluster node 'vm-blue-sql01' was removed from the active failover cluster membership. The Cluster service on this node may have stopped. This could also be due to the node having lost communication with other active nodes in the failover cluster. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapters on this node. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.  More
CERTIFIED EXPERT
Distinguished Expert 2019

Commented:
It seems the issue related to a communication issue.
be due to the node having lost communication with other active nodes in the failover cluster. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapters on this node. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges
What is the setup?

As the clarifying question.
To determine the underlying cause you have to map all the setup, dependencies, inter-dependencies...........

Usually, depending on your setup
His many nics does each node have?
How does each node get access to the storage?

Look through the cluster admin tool and identify the resources ......