Solved

AlwaysOn SQL 2014 Windows failover Cluster issues

Posted on 2016-09-12
9
48 Views
Last Modified: 2016-09-23
Hi Experts, We have upgraded SQL Server 2014 Alwayson and we are getting issues on cluster sudden failures. the cluster fail-over happening suddenly and getting resolved automatically after sometime. please see the below cluster event error messages and guide me how can I resolve those issues.

1.      Cluster node 'ABC' was removed from the active failover cluster membership. The Cluster service on this node may have stopped. This could also be due to the node having lost communication with other active nodes in the failover cluster. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapters on this node. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.
 
2.      File share witness resource 'File Share Witness' failed to arbitrate for the file share '\\xyz20p\srv23pQuorum$'. Please ensure that file share '\\xyz20p\srv23pQuorum$' exists and is accessible by the cluster.
 
3.      Cluster resource 'File Share Witness' of type 'File Share Witness' in clustered role 'Cluster Group' failed. Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it.  Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.
 

Thanks,
Sreenivasa
0
Comment
Question by:tschary
  • 4
  • 3
  • 2
9 Comments
 
LVL 39

Expert Comment

by:lcohan
Comment Utility
I assume you have left the default heartbeat and if the share where the quorum sits is "unavailable" for any reason for more than the heartbeat interval your cluster will automatically fail-over so you have 2 options - 1 good and 2 not so good but on some slower networks/connections to SAN you may have no choice:

1. make sure the share where quorum sits is 100% up no matter what
2. increase the heartbeat interval from default to a value that suits your connections.
https://blogs.msdn.microsoft.com/clustering/2012/11/21/tuning-failover-cluster-network-thresholds/
1
 
LVL 45

Expert Comment

by:Vitor Montalvão
Comment Utility
Are you using a network share as Quorum?
1
 

Author Comment

by:tschary
Comment Utility
Hi Vitor, Yes. we are using network share as quorum. please help me and provide any more solutions.
0
 
LVL 45

Expert Comment

by:Vitor Montalvão
Comment Utility
I never used a network share for Quorum and can't help you much with this.
Just know that if the share is down then the Quorum won't be available. You should check first for the availability of the share.
Also know that the network share shouldn't reside in any of the cluster nodes. Hopefully is not your case.
0
Complete Microsoft Windows PC® & Mac Backup

Backup and recovery solutions to protect all your PCs & Mac– on-premises or in remote locations. Acronis backs up entire PC or Mac with patented reliable disk imaging technology and you will be able to restore workstations to a new, dissimilar hardware in minutes.

 

Author Comment

by:tschary
Comment Utility
Hi Vitor, Network share not reside in any cluster node. It is a separate server. Please let me know What are the best practices for quorum in this case.
0
 
LVL 45

Expert Comment

by:Vitor Montalvão
Comment Utility
I usually use a SAN storage. I know that also works with a network share but besides the two validations I wrote above I don't have any expertise working with these kind of quorum.
0
 
LVL 39

Accepted Solution

by:
lcohan earned 500 total points
Comment Utility
Did you had a look at the recommendations at the link I posted just because you have a "share" that looks like is not that reliable for whatever reason?

In particular did you read about:
<<
Relaxed Monitoring – Provides more forgiving failure detection which provides greater tolerance of brief transient network issues.  These longer time-outs will result in cluster recovery from hard failures taking more time and increasing downtime.
>>
I believe you should try that first and increase as suggested "with moderation" the SameSubnetThreshold to 15 or 20.
1
 

Author Comment

by:tschary
Comment Utility
Thanks lcohan. Sure I wil go through your recommendations and implement it. will get back to you soon.
0
 

Author Closing Comment

by:tschary
Comment Utility
We have found another solution either net adapter issues and changed those  settings. Also heartbeat setting helped us.
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

JSON is being used more and more, besides XML, and you surely wanted to parse the data out into SQL instead of doing it in some Javascript. The below function in SQL Server can do the job for you, returning a quick table with the parsed data.
Ever needed a SQL 2008 Database replicated/mirrored/log shipped on another server but you can't take the downtime inflicted by initial snapshot or disconnect while T-logs are restored or mirror applied? You can use SQL Server Initialize from Backup…
Via a live example, show how to shrink a transaction log file down to a reasonable size.
This tutorial will walk an individual through the process of installing the necessary services and then configuring a Windows Server 2012 system as an iSCSI target. To install the necessary roles, go to Server Manager, and select Add Roles and Featu…

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now