Solved

Clustering Network Issue After Rebuilding Node: Node A is Reachable from Node B by Only One Pair of Interfaces

Posted on 2015-01-05
4
173 Views
Last Modified: 2015-06-30
Hey Guys -

I've got an issue I've been working on for a couple of days now and need help with.  Our company has a VDI cluster which has a total of 5 nodes.  Recently, one went down and was rebuilt.  I was told that all settings were configured as they should be and have verified for one that all of the NIC settings (static IPs, options enabled, etc) are correct or match the other hosts.

The problem is that in VMM (2008), the node is still listed as "Needs Attention."  When I run a validation on the cluster, I get many network-related issues that appear.  Below are examples of the two primary ones.  

Note:  Node C is the one which was rebuilt...

Error Type #1
Node C is reachable from Node B by only one pair of interfaces. It is possible
that this network path is a single point of failure for communication within the cluster. Please verify that
this single path is highly available or consider adding additional networks to the cluster.

Node D is reachable from Node C by only one pair of interfaces. It is possible
that this network path is a single point of failure for communication within the cluster. Please verify that
this single path is highly available or consider adding additional networks to the cluster.

Node C is reachable from Node D by only one pair of interfaces. It is possible
that this network path is a single point of failure for communication within the cluster. Please verify that
this single path is highly available or consider adding additional networks to the cluster.

Node C is reachable from Node E by only one pair of interfaces. It is possible
that this network path is a single point of failure for communication within the cluster. Please verify that
this single path is highly available or consider adding additional networks to the cluster.

Error Type #2
Network interfaces E - LiveMigration and C - LiveMigration are on the same cluster network, yet either address 10.50.7.23 is not reachable from 10.50.7.25 or the ping latency is greater than the maximum allowed 500 milliseconds.

Network interfaces E - LiveMigration and C - LiveMigration are on the same cluster network, yet either address 10.50.7.23 is not reachable from 10.50.7.25 or the ping latency is greater than the maximum allowed 500 milliseconds.

Network interfaces C - LiveMigration and E - LiveMigration are on the same cluster network, yet either address 10.50.7.25 is not reachable from 10.50.7.23 or the ping latency is greater than the maximum allowed 500 milliseconds.

Network interfaces C - LiveMigration and B - LiveMigration are on the same cluster network, yet either address 10.50.7.22 is not reachable from 10.50.7.23 or the ping latency is greater than the maximum allowed 500 milliseconds.

and so on... there are a total of 12 of the above error....

If it helps any, I restarted two of the nodes (including the one which was rebuilt) and received an IP Address Conflict message.  The error included the MAC of the NIC which was conflicting.  I found out which node the MAC was on and looked at it's IPv4 address (IPv6 disabled on all NICs / all nodes) and it didn't match any of the ones from the server that threw it - weird!

Any suggestions as to where to look or what to do?  Thanks!
0
Comment
Question by:BzowK
4 Comments
 
LVL 38

Accepted Solution

by:
Philip Elder earned 500 total points
ID: 40533908
Reboot your switches.

If NICs are teamed then one can safely ignore the first set of warnings.

I am assuming 2008 R2 for the host OS? Is full GUI installed? If yes, then enable Windows Firewall monitoring and logging for all profiles. Once enabled have a look for dropped packets on the indicated subnets.

Normally when installing the Cluster Role the firewall ports would be opened up but perhaps there are firewall restrictions hitting the nodes via Group Policy? We put our clusters into their own OU structure at the root to avoid such things.
0
 

Author Comment

by:BzowK
ID: 40535869
Good Morning -

Thanks for your replies & suggestions Phillip!  The good news is that I did get the other node to come back up so that the cluster now has 5 of 5 nodes online and active.  The cause of the issue was a few things including the Virtual Network Name on the node didn't match the cluster, the VMM agent hadn't been reinstalled on the rebuilt node, and a couple others.

However - I now have two new issues this brought that I hope you can assist with:

Issues #1 - "Unsupported Cluster Configuration" Status

We currently have about 200-300 VMs spread out amongst the nodes. After I got the rebuilt one back online, about 40 of them (spread out amongst all nodes) changed their status to "Unsupported Cluster Configuration."  I cannot find anything that makes these ~40 different via their configurations as all VMs are set to use High Availability.  The ones that have this status are still working as the ones which were started before may still be pinged & accessed, but I cannot do anything else with them.

Note:  I did find a PowerSHell script which I saw would help identify the issue if run, but it failed as the get-scvmhostcluster and other cmdlets couldn't be found so guess it only works for 2012+ (we run 2008)

Issue #2 - 6 Bad VMs

When bringing the rebuilt node back online, VMM showed that it had 6 VMs which were missing or in a bad state.  Some of the names it listed had previously been migrated to other nodes and are alive and working on the other nodes while others do not exist anywhere anymore.  How can these be resolved - especially without affecting VMs with the same names which are legit and working on the other nodes?

Thanks Guys - I appreciate your help!
0
 
LVL 34

Expert Comment

by:Seth Simmons
ID: 40859085
This question has been classified as abandoned and is closed as part of the Cleanup Program. See the recommendation for more details.
0

Featured Post

Complete VMware vSphere® ESX(i) & Hyper-V Backup

Capture your entire system, including the host, with patented disk imaging integrated with VMware VADP / Microsoft VSS and RCT. RTOs is as low as 15 seconds with Acronis Active Restore™. You can enjoy unlimited P2V/V2V migrations from any source (even from a different hypervisor)

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

It Is not possible to enable LLDP in vSwitch(at least is not supported by VMware), so in this article we will enable this, and also go trough how to enabled CDP and how to get this information in vSwitches and also in vDS.
This is an issue that we can get adding / removing permissions in the vCSA 6.0. We can also have issues searching for users / groups in the AD (using your identify sources). This is how one of the ways to handle this issues and fix it.
This tutorial will walk an individual through locating and launching the BEUtility application to properly change the service account username and\or password in situation where it may be necessary or where the password has been inadvertently change…
This Micro Tutorial hows how you can integrate  Mac OSX to a Windows Active Directory Domain. Apple has made it easy to allow users to bind their macs to a windows domain with relative ease. The following video show how to bind OSX Mavericks to …

911 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now