We help IT Professionals succeed at work.

spotty SQL CLUSTER connectivity, randomly get a FAILED TO CONNECT TO SERVER when accessing through management studio

i keep getting errors relating to my network when trying to access a sql server that is on my network
its a SQL CLUSTER consisting of 2 nodes (1 database instance, so its active/standby)

MS Failover clustering was setup correctly but i now see an error on my "cluster network 2" when i am in the failover cluster manager.
the status is PARTITIONED, the cluster use is ENABLED and the SUBNET is the network that i am logged into on the server via RDP, and the network that all the clients connect to the server.
the other network is a private network, which is used for iSCSI, because the database is located on a SAN (the status of that network is UP and the cluster use is INTERNAL)

the problem is that i will randomly get errors when trying to connect to the SAN stating that there is a network related problem of some sort. i would imagine that this is due to the second cluster network having a problem. i also CANT ping each node from the other node, although they are on the same subnet (consecutive IPs actually)

any ideas? maybe this has to do with an incorrect cluster setup? all users are connecting via DNS name, maybe the cluster is also trying to connect users to the standby node?
Comment
Watch Question

Commented:
Check the windows firewall is configured correctly (or turn off).
When pinging is the heartbeat address and node address's being used?
jsctechyInfrastructure Team Lead

Author

Commented:
i restarted the node, and the cluster network came back up.

right now, i can ping each node individually, but when i ping the sql cluster IP or DNS name, i dont get a response (the DNS name resolves to the correct ip though)

if i ping the windows cluster DNS and IP, i DO get responses

Commented:
Is the SQL network name and IP address resource on line?
Firewall?

From the active node can the addresses be pinged.

Is a remote client machine being used to ping, could try an ipconfig /flushdns  from the command line on the box that does not get a return ping.
jsctechyInfrastructure Team Lead

Author

Commented:
firewall is disabled,
from the active node i CAN ping both nodes, as well as the MS cluster IP and the SQL cluster IP and DNS name
tried flushing the DNS but no luck on the client machine.

as for the network name and ip address resource, where can i find out of this is on line?

Commented:
From Fail-over administrator under admin tools.
If one can ping as indicated then the resources are online.

What OS is the client machine?
jsctechyInfrastructure Team Lead

Author

Commented:
there are various client machines connecting
mostly windows 2003, 2008 and windows 7

where is failover administrator? are you referring to the FAILOVER CLUSTER MANAGER? or is that just for the microsoft windows cluster and not the sql cluster?

Commented:
For the cluster and it's resources that is the correct tool Failover Cluster Manager.
The SQL management studio is only concerned with SQL.

Can the cluster nodes ping their gateways and DNS servers?
jsctechyInfrastructure Team Lead

Author

Commented:
heres a list of what they can and cant ping-

sql1 (192.168.67.21)

gateway - yes (192.168.67.3)
other cluster node - yes (192.168.67.22)
dns servers - yes (192.168.57.2)
ms cluster - yes (192.168.67.23)
sql cluster - yes (192.168.67.25)


sql2 (192.168.67.22)

gateway - yes (192.168.67.3)
other cluster node - yes (192.168.67.21)
dns servers - yes (192.168.57.2)
ms cluster - NO (192.168.67.23)
sql cluster - yes (192.168.67.25)

its worth noting that each server has 1 NIC
im starting to think its some sort of ARP issue? cleared arp on all the network devices and the servers, still no good though, connectivity is still unstable.

jsctechyInfrastructure Team Lead

Author

Commented:
looks like the problem was a faulty NIC card!!! thanks for the help guys