[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now


Cluster manager redundancy

Posted on 2007-10-09
Medium Priority
Last Modified: 2013-11-09
My company has two servers running windows 2003 and Sql 2005.  These boxes are being hosted at a local hosting company.  Currently, the servers are set up as a cluster.  Each box has three nic cards.
The first card (Front end - Public ip) the second ( back end - private ip) third ( heartbeat).  We have tested this redundancy many times by powering down the first box and like clock work it switches to the second box.  
We had a situation yesterday that made the redundacy fail.  Our front end card became disabled ( for lack of a better word) and the box was taken off line.  The redundancy did not work because the heartbeat was still working??  
So my question is - is there something wrong with the way our hosted company set up cluster manager on these servers?  Is there a better way??
Thanks for your time
Question by:hexvader
  • 2
  • 2
  • 2
LVL 85

Accepted Solution

oBdA earned 500 total points
ID: 20043542
Exactly what did you mean with "[the] card became disabled"? If you have an IP resource in the SQL group, and this IP address fails because it can't bind to the (disabled) NIC anymore, then you should have had a failover.

Author Comment

ID: 20043763

What happened is this - One of our it guys remote controlled to the server.  He needed to transfer a large data base file from the server down to our network.  Normally we would create a vpn session to our hosted facility but we have been having problems transfering large files for the past few days.  So he decieded to create a vpn session from the server to our network and pull it down that way.  As soon as he launched the vpn - he lost contact via remote control and the server was in limbo.  I believe once he created the vpn connection, the server used the default gateway / dns settings of our internal network and just went off line.  During this time frame - (about 8 minutes) the heartbeat was still showing the server as up when in fact it was down.  We got in contact with the hosted facility and had them reboot the box.  During the reboot - the redundancy kicked and our site and data came back.
I am quite positive we wont repeat this process again - but the question remains - Was the box technically down and should the roll over have worked.  What would happen if the nic died?? The heart beat would still be working  would the redundancy kick in??
LVL 51

Assisted Solution

by:Ted Bouskill
Ted Bouskill earned 500 total points
ID: 20044025
Ah, how far to take redundancy can get very complex.  IE: Does your cluster have one network switch or are they redundant?  Do you have redundant firewalls in front of the server?  The list can go on an on.

Would the risk of the switch dying be higher than the NIC card?

I'm wondering if your cluster dependencies are misconfigured.  I'm pretty sure that if one of the NIC's on my cluster was disabled it would rollover automatically.
Get your Disaster Recovery as a Service basics

Disaster Recovery as a Service is one go-to solution that revolutionizes DR planning. Implementing DRaaS could be an efficient process, easily accessible to non-DR experts. Learn about monitoring, testing, executing failovers and failbacks to ensure a "healthy" DR environment.

LVL 85

Expert Comment

ID: 20044752
As I said: as long as the IP address is bound correctly to the clustered NIC, it's not considered a failure. If the NIC dies completely, the TCP/IP stack of this NIC will go down, so will the IP address resource, and finally the group will failover.
In your case, the NIC didn't fail completely, so there was no reason for the cluster to fail the resource over.
MSCS is more complex than just listening for the heartbeat of the other node.
LVL 51

Expert Comment

by:Ted Bouskill
ID: 20046315
Hmm, the virtual IP should belong to the same subnet as the static IP's for the front end NIC.  If one front end NIC had it's subnet changed I would think that would trigger a failover condition.

Author Comment

ID: 20921426
Thanks for your help guys.  

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article will show how Aten was able to supply easy management and control for Artear's video walls and wide range display configurations of their newsroom.
One of the most important things in an application is the query performance. This article intends to give you good tips to improve the performance of your queries.
Via a live example, show how to extract information from SQL Server on Database, Connection and Server properties
Via a live example, show how to backup a database, simulate a failure backup the tail of the database transaction log and perform the restore.
Suggested Courses

834 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question