SAN updates / Hyper-V machines

We need to perform firmware updates to our Equallogic SANs. They are on a dual-controller where it first would update the stand-by then failover and update the other controller.  However, this process still interrupts iSCSI connections for a brief period of time.  Our VMs are mostly Windows and the OS can handle up to a minute of disconnection to the back-end SAN, but we also have many SQL workloads and SQL does not handle this well at all (we saw some corruption the last time we updated our SANs)

So we're considering this time around maybe "Saving" or "Pausing" all the VM's first, performing the SAN firmware updates, then resuming all the machines in Hyper-V.  I'm not really sure what is best however (saving or pausing), or what other alternatives to consider. I know pausing doesn't save the state anywhere which means if the node happens to bluescreen or reboot for some reason the state of all those machines on that node would be lost.

Saving seems to make more sense but wanted to get a feel for how others are handling updates like this. Any advice would be appreciated here.

One cluster runs Server 2012 R2 node, and a separate cluster runs Server 2012 nodes (different SANs for each).  The VMs on the nodes are mostly Windows machines, but may be running anything from Server 2003 to 2012 (most are running Server 2008 R2). SQL would also be a variety of versions from SQL 2005 to 2012, most being SQL 2008 or SQL 2008 R2.

Thank you
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

If saving, pause or shutdown, the machines are not available to the clients.
The fastest is pause, the slowest a shutdown and save need some additional hard disc space.
As virtual machines are booting fast, I would prefer a shut down of all machines,
If they have a automatic start delay set, you even can boot all machines in the right order by rebooting the VM host. (assuming the start delays are correctly configured).
Saving is an option and Pausing I would avoid.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Philip ElderTechnical Architect - HA/Compute/StorageCommented:
Storage Live Migration would be an option for 2012 RTM/R2 clusters. LM the storage off the unit to be upgraded. When complete LM the VMs back.
Repeat for each SAN to be updated.
NOTE: If using dynamic VHDX files make sure not to overcommit!
Storage LM means no downtime.
VasAuthor Commented:
We don't have another SAN to live migrate all the machines to.
Active Protection takes the fight to cryptojacking

While there were several headline-grabbing ransomware attacks during in 2017, another big threat started appearing at the same time that didn’t get the same coverage – illicit cryptomining.

Philip ElderTechnical Architect - HA/Compute/StorageCommented:
First sentence states "Equalogic SANs"?

If storage on the other SANs are limited then choose which VMs stay live via LM and which ones get shut down.

I sure hope there is a known good backup in place.
VasAuthor Commented:
These are two completely separate clusters and sans, sorry I didn't mean to complicate the question mentioning that there were two environments.
Philip ElderTechnical Architect - HA/Compute/StorageCommented:
No worries. It was worth a try. ;)

EDIT: But seriously though. Another option is Shared Nothing Live Migration between the clusters.
VasAuthor Commented:
Both clusters are pretty much at capacity.

Each cluster has between 8-12 nodes, so we're talking several hundred VMs.

The SAN has two controllers, what we're concerned with is the <1 min of iscsi disconnect that will occur during controller failover.
Just wondering, because a redundant system should keep the service up and running, but a different question....

If you do not want to shutdown all the machines (or at least save), then I would separate the machines into categories.

OS drive is located on the affected SAN:
--> shut down or save

OS drive is not on and SAN and...
...the machine writes data files to the SAN (SQL, Exchange, possibly also DCs)
--> shut down or save
...the machines serves services and possibly connect to an affected SQL
--> these machines may throw errors as they loose the connection, but should come back if the connection is available...
--> you can pause then to stop the traffic.
Philip ElderTechnical Architect - HA/Compute/StorageCommented:
If a LUN ownership move is initiated in the SAN's console shouldn't it move over with no drops? And, even if there was a momentary pause in the stream the Cluster Service would compensate.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.