Link to home
Start Free TrialLog in
Avatar of Blue Berry
Blue Berry

asked on

MS Failover cluster Troubleshooting

Hi expert

What is the best practise to troubleshoot failover cluster
Is there any scripts out there that I can run to see why it has failed over.

List of checks I can check to pin point the failover

Thanks in advance for your help
Avatar of arnold
arnold
Flag of United States of America image

The event log will reflect some info on the transition.
Having monitoring of nodes, and the cluster.....
Access to storage, nics, heartbeat.


What kind of cluster?
Avatar of Blue Berry
Blue Berry

ASKER

Two node cluster
Two node which application? Active/active setup or active/Passover?
The former active node's eventlog shoukd have the event notice about the transition.
So if it is active /active or active /passive if I look at the logs of active that should give me that answers
I am new to this there is also an active/passive/ witness

Would that be the same check the active logs for the answer for a fail over
Both nodes in either cluster write to the cluster log.
What is the OS version?
Can use powershell commands to retrieve cluster log information see link:
https://docs.microsoft.com/en-us/powershell/module/failoverclusters/get-clusterlog?view=win10-ps
Server 2012 and we are also using 2008
Yes I have used power shell commands
active/active/witness you might be referring to SQL mirroring and not clustering.

open cluster admin, and see what you have there.
active/active means you have two clustered Database servers and each runs on the one node, but the resources are such should one node fail, the other will handle both.

presumably 2008 refers to the SQL server 2008 which is the clustered application?
Yes that is what we have just had a look

If it fails over which it has what are the steps I will need to take to pin point the fail over



Here is the error message I get in cluster manger

 Note :
Cluster node 'vm-blue-sql01' was removed from the active failover cluster membership. The Cluster service on this node may have stopped. This could also be due to the node having lost communication with other active nodes in the failover cluster. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapters on this node. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.  More
It seems the issue related to a communication issue.
be due to the node having lost communication with other active nodes in the failover cluster. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapters on this node. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges
What is the setup?

As the clarifying question.
To determine the underlying cause you have to map all the setup, dependencies, inter-dependencies...........

Usually, depending on your setup
His many nics does each node have?
How does each node get access to the storage?

Look through the cluster admin tool and identify the resources ......
This question needs an answer!
Become an EE member today
7 DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform.
View membership options
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.