Link to home
Start Free TrialLog in
Avatar of hexvader
hexvaderFlag for United States of America

asked on

Cluster manager redundancy

Hello,
My company has two servers running windows 2003 and Sql 2005.  These boxes are being hosted at a local hosting company.  Currently, the servers are set up as a cluster.  Each box has three nic cards.
The first card (Front end - Public ip) the second ( back end - private ip) third ( heartbeat).  We have tested this redundancy many times by powering down the first box and like clock work it switches to the second box.  
We had a situation yesterday that made the redundacy fail.  Our front end card became disabled ( for lack of a better word) and the box was taken off line.  The redundancy did not work because the heartbeat was still working??  
So my question is - is there something wrong with the way our hosted company set up cluster manager on these servers?  Is there a better way??
Thanks for your time
ASKER CERTIFIED SOLUTION
Avatar of oBdA
oBdA

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of hexvader

ASKER


What happened is this - One of our it guys remote controlled to the server.  He needed to transfer a large data base file from the server down to our network.  Normally we would create a vpn session to our hosted facility but we have been having problems transfering large files for the past few days.  So he decieded to create a vpn session from the server to our network and pull it down that way.  As soon as he launched the vpn - he lost contact via remote control and the server was in limbo.  I believe once he created the vpn connection, the server used the default gateway / dns settings of our internal network and just went off line.  During this time frame - (about 8 minutes) the heartbeat was still showing the server as up when in fact it was down.  We got in contact with the hosted facility and had them reboot the box.  During the reboot - the redundancy kicked and our site and data came back.
I am quite positive we wont repeat this process again - but the question remains - Was the box technically down and should the roll over have worked.  What would happen if the nic died?? The heart beat would still be working  would the redundancy kick in??
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of oBdA
oBdA

As I said: as long as the IP address is bound correctly to the clustered NIC, it's not considered a failure. If the NIC dies completely, the TCP/IP stack of this NIC will go down, so will the IP address resource, and finally the group will failover.
In your case, the NIC didn't fail completely, so there was no reason for the cluster to fail the resource over.
MSCS is more complex than just listening for the heartbeat of the other node.
Hmm, the virtual IP should belong to the same subnet as the static IP's for the front end NIC.  If one front end NIC had it's subnet changed I would think that would trigger a failover condition.
Thanks for your help guys.