Link to home
Create AccountLog in
Avatar of dsa_admin
dsa_admin

asked on

Windows 2008 R2 failover cluster problem

Hi,

I have installed two new dell servers with EMC SAN shared storage. I have installed windows 2008 R2 on both machines with the latest drivers. At the beginning, I have successfully created the cluster and added two servers with no issues, the shared storage was working fine and I havew installed the required application with no issues.
Suddenly, the cluster failed after two months and I have shifted the storage to the other node with no issues, but the failed node was unable to work again, I have tried to evict the node and add it again with no success. Also, I have installed new windows on the second server with different cmputer name and IPs but failed to add it to the cluster.
|Finally I have destroyed the cluster and formatted both servers, but I could not create a two node cluster, I can create one node cluster only.
The only thing that I could not change is the storage as it has the application files.
The cluster log is attached.
When I make a cluster clarification for the two nodes, all tests will pass with no errors or issues.
I have many other servers with windows 2008 servers with the same shared storage and cluster configuration which are working fine.

Waiting for your help...
 cluster-log.txt
Avatar of Rant32
Rant32

I see several references to 169.254.x.x IP addresses in your log file, I doubt that's your actual IPv4 address scheme.

Think about how networking is set up on both your servers. Interfaces, what addresses, whether they're public/cluster/heartbeat, etc. If you can, could you share them with us?

Do you use dynamic DNS registration? Make sure that there are no entries in DNS that resolve to an incorrect address for any of the nodes. Which interfaces are enabled for dynamic DNS registration?

Chances are, if you were using dynamic IP addresses on the cluster nodes (you are now), that an incorrect DNS registration broke cluster communication.
Avatar of dsa_admin

ASKER

I am using statis IP configuration. Each server has two Network interfaces, (public: 10.3.3.0, provate: 192.168.200.0). The public IPs are registered in the DNS, nslookup all servers by name with no issues.
DHCP is not used.
I think that the cluster is using the virtual cluster LAN with dynamic IP addresses.
Are you using DHCP on the virtual IP address resource(s)? The 169.254 address indicates that it can't obtain one or the address lease was denied.
I ma not managing the virtual interface, it is managed by cluster service.
-Turn off Windows firewall and disable any anti-virus software.  
-Ping all interfaces on Node 2 from Node 1.
-Ping all interfaces on Node 1 from Node 2.
-Correct any network configuration errors, if present
-Run the Cluster Validation report.  Do not skip any tests (include all).

If there is an error in the configuration, the validation test will find it.  You rarely see a 100% clean validation report followed by errors joining a cluster.

And finally, never evict a node unless you know 100% for sure what you are doing.  That process generally leaves all kinds of remnants and you're better off starting from scratch again (reimaging/reinstalling) if it is not a production system.
ASKER CERTIFIED SOLUTION
Avatar of dsa_admin
dsa_admin

Link to home
membership
Create an account to see this answer
Signing up is free. No credit card required.
Create Account
Solved successfully.