Vmware iSCSI MPIO failover

We are looking at moving our ESXi 5.5 hosts to a new stack of switches.

Right now, we have 4 hosts connected with a minimum of 2 physical NIC's each to our storage network. On our storage network, we have 2 Nimble arrays with 5 iSCSI connections each.  I was wanting to try to move these servers and storage arrays to the new switches without downtime if possible.

My plan was to take a couple NIC's from the SAN's and connect them to the new switch, then run through the esxi hosts pulling 1-2 NIC's from each server and connecting them to the new SAN one at a time. Then once that is done, removing the rest of the NIC's on the old switch and move them to the new switch.

Essentially walking off the cables 1 by 1.

When I pull the cable on a NIC that has active I/O traffic on it, will the failover to a secondary NIC be relatively seamless, or are we looking at the possiblity of a little disruption while traffic fails over to another NIC.

Our hosts are setup so that we have iSCSI vswitch 1 with VMNIC0, iSCSI vswitch2 with VMNIC 1, iSCSI vswitch 3 with VMNIC 2 etc etc. Then all of those are setup on the software iSCSI adapter as active adapters for failover.

Would this be possible, is there a better way to move to a new switch live, or would my best be to bite the bullet and get some downtime scheduled to do this move.
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Mr TorturSystem EngineerCommented:
you could do it online 1 ESXi at a time, but at a less critical hour (at night for example).
To know if your iSCSI failover will create a disruption or not, long or not, I can't say, as it depends of your San and VMware configuration and of your load on your entire vsphere.
themightydudeAuthor Commented:
If it helps any we use the nimble PSP connector to manage paths on our ESXi boxes. Essentially all this does is mark all paths/ nic's as active/active.

Maybe our best option would be to move some of the more critical loads to local storage and leave the less critical stuff on shared storage, then move it all back to shared after the move.
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
if this is iSCSI, you will want to design your iSCSI Storage Network as follows, per my EE Articles

HOW TO: Add an iSCSI Software Adaptor and Create an iSCSI Multipath Network in VMware vSphere Hypervisor ESXi 5.0

Do you have a Nimble Multipath module to add ?
Protecting & Securing Your Critical Data

Considering 93 percent of companies file for bankruptcy within 12 months of a disaster that blocked access to their data for 10 days or more, planning for the worst is just smart business. Learn how Acronis Backup integrates security at every stage

themightydudeAuthor Commented:
This is iSCSI and I don't have designed quite like that.

I have it designed where we have 3 different physical NIC's and each Physical NIC is tied to the vmkernel port in 1 vswitch.

So essentially what I have on my hosts is this:



VMK_Port iSCSI3 -> VMNIC 3

Then all 3 of those are added to the iSCSI storage adapter as vmkernel port bindings and are set to active.

Since we have nimble SAN's, Nimble provides a connector that installs a nimble PSP director on the server, but it essentially marks all paths as active (I/O).
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Umm, not quite best practice and recommended.
themightydudeAuthor Commented:
Based on how I have it though, or even if we re-designed based off your article, would you anticipate an interruption of service if we did that "walk off" of the network cables to a new switch?
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
We know that our design works with path failure, because we test it.

if all your paths are Active (I/O) check in paths, if a path fails, you should be okay?

You've never tested this before production? < - slap on wrist!

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
themightydudeAuthor Commented:
I'm not sure if they tested it or not..this was setup before I arrived.

And yes all of our paths are marked as Active (I/O). I was thinking we should be fine, but I had read somewhere that you could have a minute or so of paused VM's while traffic is sent over the new active links.

Thanks for the info.
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
I've never seen paused VMs, on path failover....

and often, wonder into our machine room, and turn off a switch......just to test.....just to see if Ops notice.....
Mr TorturSystem EngineerCommented:
same here for path failover : if it is configured correctly you won't have paused VM.
And fully agree, the only way to know if failover works is to test it.

Maybe powering off a switch could be a little violent.. well, I am not saying Andrew is violent at all ;-) , but maybe you might have the option to make some sweeter tests :
- if you have a cluster with 4 hosts, maybe you can put all your production VM on 3 hosts ;
- then keep one or several test VM on the last 4th host, and execute a network copy or ping -t or any test you want with the test VM ;
- disconnect one of this 4th host's ethernet cable for iSCSI ;
- then look at your test VM if it works, if its network has been cut or anything else.
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
In our opinion, that's a real world test, when a switch fails! (cables don't often come unplugged!, unless the catch is busted!)

and the number of organisation we come across on a daily basis, which have never tested switch failure, and expect it all to work, when a switch fails, is high because they do not test before they put servers into production!
Mr TorturSystem EngineerCommented:
I really agree with you, but this was for him to go step by step, and start with a tiny test before doing so.
But I think - and I do it when I install san and vsphere - a reliable virtual infrastructure is when you have :
- test the san path failover (iscssi or fc switch failure or one path down)
- test a host failure (for HA)
- test a complete failover if you have DR

In fact yes, in real world we more often loose a switch than only a path.. another nth point for you! ;-)
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.