Avatar of Joe Lowe
Joe Lowe
Flag for United States of America asked on

Network Maintenance on Failover Cluster

We have a 14-node Microsoft Failover Cluster that has 4 networks configured.

2 iSCSI 10GbE networks for our SAN and CSVs - Cluster Use: None

1 Live Migration network - Cluster Use: Cluster Only

1 Production Network - Cluster Use: Cluster and Client

We have some planned maintenance occurring on the stack switches that run the Production network resulting in them being down for about 2 minutes at most. We ideally do not want to shutdown all VMs and Cluster for this as it will be only down while the switches reboot. 

Outside of there being a small network blimp for VMs, will this negatively impact my Cluster? Example: My Cluster will not totally fail, shutdown, etc. 

MicrosoftNetworking* Failover ClusterHyper-V

Avatar of undefined
Last Comment
Joe Lowe

8/22/2022 - Mon
Philip Elder

Can the switches be selectively rebooted to allow for the cluster and the switching fabric to reroute during each reboot?

Taking them down point-blank would not be a happy place to be for the cluster.

What kind of network teaming and virtual switch setup is there on each node?
Joe Lowe

Unfortunately they cannot be. The stack reboots together as a whole.

The NIC Teaming is on the Production Network, on the 2 stacked switches.
Virtual switch on the nodes are setup to point to this same team.

There is also a NIC team on the Live Migration network. This network is not getting rebooted.

Philip Elder

We always force the switches out of Stack Mode when that happens so that we can reboot each one individually to avoid any issues.

Cluster Only on the storage fabric would be one way to allow the cluster to communicate thus not lose touch.

You can disable Live Migration for that fabric as well as a precaution.
I started with Experts Exchange in 2004 and it's been a mainstay of my professional computing life since. It helped me launch a career as a programmer / Oracle data analyst
William Peck
Joe Lowe

Right now we have Cluster Only on the Live Migration network already. Since that network is not being worked on and will remain online, could that still help keep the Cluster in a good state while the Production network (Cluster and Client) goes offline briefly? 
Philip Elder

Are the Live Migration and the Production Network fabrics on the same switches or different ones?

If different, then yes that should work out just fine.
Joe Lowe

Yes, the Live Migration network runs on a separate network and separate switch from the Production network.
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
Joe Lowe

Since Live Migration and Production networks are totally separates running on separate switches from one another, would you say it's safe to perform the maintenance on Production and the only thing we should see is a brief disruption in network traffic on the VMs?

A bit of info on our stacked switches that are going to undergo maintenance, these are Cisco Meraki switches so removing them from stack membership and re-adding I'm not sure is seamless as when I called support about options to reboot one at a time, they didn't suggest that as an option. 
Philip Elder

View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
Philip Elder

The VMs would be disconnected/unreachable for as long as the switches are offline/rebooting.

We've seen some go much longer than two minutes for reboots. Is this a known commodity as far as the time it takes?

As far as rebooting them one at a time, I'm not sure now that I think about it. Maybe they could be but in my experience the uni-pane for management means click Reboot and they are all going.
Joe Lowe

Are you referring to the Meraki switches?

Meraki has introduced their next firmware upgrade so during the download of the upgrade the switches stay online until it's time to apply the update. At that point, they then reboot themselves. Support confirmed the model we have (MS225) reboot rather quickly and should take no more than 2 minutes. Previously we've had this exact same thing happen unexpectedly and the Cluster appeared okay afterwards except for the Cluster alerting that it lost connection to the virtual switch like you mentioned. Last time it was so unexpected we were trying to ensure everything was okay quickly but we did confirm the switches were down for about 2 minutes during the reboot.

This week we had our MS125 at some offices doing the upgrade and they were offline for 3 minutes. 
Experts Exchange has (a) saved my job multiple times, (b) saved me hours, days, and even weeks of work, and often (c) makes me look like a superhero! This place is MAGIC!
Walt Forbes
Philip Elder

Yes, is the reboot time a known commodity in that the switch reboot time was seen.

Sounds to me like you're pretty much good to go though especially since an outage was experienced and things behaved themselves.
Joe Lowe

Yes, it appears to be.

Awesome, thank you Philip for your help. Now that we had a chance to properly plan and schedule this, we wanted to be 110% before undergoing this.