Link to home
Start Free TrialLog in
Avatar of it_medcomp
it_medcompFlag for United States Minor Outlying Islands

asked on

MPIO 30-second failover delay?

I am setting up a 3-node Windows failover cluster using HP dl360’s with emulogic SM1200e card to handle MPIO connecting over fc to an IBM v5000 SAN. all were configured together, click for click. when I went to test the failover, nodes 1 and 3 copied files with no disruption at all. when I pulled the fo cable on the first connection for node two, MPIO stopped copying for 30 seconds before resuming without replugging the cable. I moved the fiber to different ports on the SAN but the problem followed the port on the hypervisor- same behavior regardless of which SAN ports I used. in case this is not clear I am assuming the problem lies with the second connection that I did not pull on the node 2. pulling the other cable on this node worked just like the other connections with no disruption. so I assume no issue with the San or server just the card. any idea how to find out why it waits 30 seconds before switching paths?
Avatar of Member_2_231077
Member_2_231077

Please run this on all 3 nodes to confirm timings. 30 seconds is normal.

PS C:\> Get-MPIOSetting

When you say nodes 1 and 3 did not take 30 seconds to fail over did you try pulling one cable, putting it back and then pulling the other? There's no guarantee that it will be using a particular path so you may have pulled a passive one on the two with no 30s delay.
Avatar of it_medcomp

ASKER

the connections are active-active and SDDDSM is installed. I ran the command on node 1 and the problem node 2... results are identical- pathverificationstate Enabled, pathverificationperiod 30, retrycount 16, disktimeoutvalue 60. all other settings are blank.
Even with all paths available MPIO can only send a command down a single path so if that path fails that command won't time out until pathverificationperiod. One useful experiment is to have 4 LUNs presented to the host and run a copy on each one, then pull a cable and normally two LUNs keep going whereas the other two stop for 30s (if using round-robin).
I’ve been using the default- what are the choices (round robin etc?) and where do a I change them- is that in the fc adapter driver somewhere or is it a MS MPIO setting in Windows?
It's under the load balancing policy on the MPIO tab for the "disk" properties - https://blogs.msdn.microsoft.com/san/2008/07/28/how-to-set-the-mpio-policy-for-a-disk-in-windows-server-2008/ . Round Robin is most popular but https://www.ibm.com/support/knowledgecenter/en/STLM5A/com.ibm.storwize.v3700.710.doc/svc_w2kmpio_21oxvp.html suggests SDDDSM may have its own policy.
I went through and used SDDDSM to switch policies from their boring one to shortest queue length service time for all 6 adapters. this still won’t make the one path shorten from 30 seconds though. do you think there is a problem with that server’s connection on the path I wasnt pulling? so I pulled the one connection and the path remaining had some issue that prevented the seamless failover? diagnostics didn’t reveal any issues. I just hate to change a setting on only 1 out of 6 cards when all should be identical- I don’t want to creat some minor problem now that becomes major down the road.
It is very odd that only one host exhibits this behaviour, I would expect them all to behave like this. Reading through the Multipath Subsystem Device Driver User’s Guide there's lots of failover timers both under SDDDSM and the storage, after all instant failover would use 100% resources just for monitoring.
Exactly. I'm thinking hardware is the issue here, and I've been having a hard time isolating whether the culprit is a fiber card, an SFP, or the node canister on the SAN. I obtained an additional SFP, and I am going to see if IBM support can help rule out the SAN. We have had this unit for a year and a half, and we never used it like this before. It needs a minor code upgrade anyway, so we might be dealing with one of those factors. I'll post an update when I have one. Thanks!
This question needs an answer!
Become an EE member today
7 DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform.
View membership options
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.