Avatar of Rick_Penney
Rick_Penney
 asked on

Configuration of Server SAN with Failover clustering.

Hi, we are looking at some further guidance on setting up failover clustering on two Windows Servers 2019 connecting to a HP MSA2052 SAN.

•      The servers are DL380 Gen10 with 8 NIC ports.
•      The current wiring connections are shown below (we will redo this for redundancy on the controllers)
•      The MSA 2052 has two controllers, each controller has two connections on different subnets connecting to the two servers via one switch with two vlans. (Layer 2 switch)
•      The servers connect to a different production switch for normal data traffic.
•      The iScsi initiators and MPIO have been configured and both servers can see the storage volumes.
•      The Servers can both ping all four ip addresses on the 172.16.*.* subnets, the nics on the 172.16.*.* subnets, don’t have a default gateway configured as per guides we have been reading. Jumbo frames are set up on the iSCSI NICS and switch.
•      The servers are pingable via their full domain name from other devices on the 10.226.*.* production network. The servers are in the correct Server OU.
•      The servers currently don’t have a network setup for a heartbeat and migration.
•      The servers also have Teamed NICs setup for a virtual switch within HyperV manager

The failover clustering invalidate configuration wizard test is failing on:
At least two independent paths to storage target are recommended for each Test Disk
Test Disk 0 from node Server1 (shows FQDN) has 1 usable path to storage target
Test Disk 0 from node Server2 (shows FQDN) has 1 usable path to storage target
At least two independent paths to storage target are recommended for each Test Disk
Test Disk 1 from node Server1 (shows FQDN) has 1 usable path to storage target
Test Disk 1 from node Server2 (shows FQDN) has 1 usable path to storage target

However, we have created a cluster and everything comes up green.
Questions:
The first host that we created a cluster on prompted for an IP Address, we have currently assigned an IP Address from our Production subnet, but this may be wrong and should be on a separate VLAN, please advise.
The second host when entering the name of the cluster also looks good and we can see everything as expected.
Do we need to create a further separate subnet for a heartbeat and if so how is it configured within the software?

After this all correct, do I assume that the VM Guests are configured in HyperV manager and that the location of the Virtual disks points back to the IP address of the Cluster?

What is configured in the Hyper V Manager HyperV SAN Manager?

If possible, an explanation would be better than a URL link as we have already pretty much exhausted most guides found from internet searches. Sorry, i know its a lot of questions.Server San wiring diagramRegards
Rick
StorageVirtualizationNetworkingHyper-V* Clustering

Avatar of undefined
Last Comment
andyalder

8/22/2022 - Mon
SOLUTION
Paul MacDonald

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
andyalder

•      The MSA 2052 has two controllers, each controller has two connections on different subnets connecting to the two servers via one switch with two vlans. (Layer 2 switch)

So if that switch fails it all crashes? If you only have two hosts then throw the switch away and connect point-to-point.
Paul MacDonald

The switch gives the network connections on the SAN some redundancy, so it does help, but I also noted that the switch is a single point of failure.

Rick_Penney

ASKER
Thanks Paul for your Post, its very much appreciated.

With ref to my statement of:
"The first host that we created a cluster on prompted for an IP Address, we have currently assigned an IP Address from our Production subnet, but this may be wrong and should be on a separate VLAN, please advise."


I didn't really explain. The two physical hosts have static ip addresses on the NICs that connect back to the production network.
What i meant was that when we clicked on create a cluster it asked for a network address to use, which we assigned an IP from the same production network subnet.
There have been so many conflicting guides ref this topic with none giving all the info, most guides mention setting up a separate migration network so i wasnt sure if the cluster ip should indeed be on the production network or separate.

If you dont mind i'll keep this post open for a day in case we have anymore queries with adding the VM's to the Failover Cluster Manager after adding them to the HyperV manager, but good to know we are almost there with it.
Kindest regards
Rick
All of life is about relationships, and EE has made a viirtual community a real community. It lifts everyone's boat
William Peck
Rick_Penney

ASKER
Thanks Guys, i think when we were point to point we were in a world of pain after reading so many different ways of doing it and the two servers although could ping each other on the production LAN (10.226.50.*) couldn't ping each other via their 172.16 addresses and the failover customer validation wizard test failure led us to believe that it couldnt be done this way.
Would it be possible to roughly draw a layout for the isci connections please?
regards
andyalder

Four cables instead of the switch gives full redundancy, saves a little bit of latency and heat.
Paul MacDonald

"i wasnt sure if the cluster ip should indeed be on the production network or separate"
The cluster IP is for talking to the cluster and would almost certainly be from the production network.  The cluster IP won't matter much to you in THIS scenario because you generally won't talk to the cluster.  

But imagine if you had two file servers (or web servers, or mail servers) in a cluster - you'd point people to the cluster IP because you simply want to talk to whoever owns the cluster - you don't care which specific server it is.

⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
Rick_Penney

ASKER
Ah, understood, thank you.
i'll give it all a go this afternoon/tomorrow morning.

Thanks Guys
Paul MacDonald

To illustrate what I'm talking about, here's an image I knocked together.  I included (what I understand to be) Andy's suggestion, and why I disagree with it.  No switches, one switch, or more switches, you'll get some level of redundancy and fault tolerance no matter what you do.  We went virtual several years ago (my implementation has three of everything) and it's one of the best decisions I've ever made.

ASKER CERTIFIED SOLUTION
andyalder

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
Rick_Penney

ASKER
Nice one, thank you, just so i'm 100% clear, you said i dont need to worry about vlans, but can all these connections be on 172.16.0.0/24 or am i still using two subnets 172.16.0.0/24 and 172.16.1.0/24, to me it looks they can all be on the same?
This is the best money I have ever spent. I cannot not tell you how many times these folks have saved my bacon. I learn so much from the contributors.
rwheeler23
andyalder

Are you talking about the front end that Paul knows about or the storage connections that I know about?
For the storage if you direct attach then each cable is a separate subnet.

BTW, in the storage industry you don't connect the two switches together on the back end as no traffic goes over that link.
Paul MacDonald

"...can all these connections be on 172.16.0.0/24 or am i still using two subnets 172.16.0.0/24 and 172.16.1.0/24, to me it looks they can all be on the same?"
I use a flat, class "C" for my SAN connection.  There's no harm in using VLANs, but they're not necessary.  The switch(es) will segregate the traffic, and it's only the two SANs and two hosts on that network.


Andy:
"...you can see that if controller 1 fails then both hosts will get I/O via controller 2 instead."
Disconnect the cable from host 1 to controller 1, or lose that NIC in host 1, or lose that NIC in controller 1 and now host 1 cannot get to controller 1.  The addition of one or more switches eliminates that problem.  It may be the HP SAN has an internal switch that connects the two controllers - my NetApp SAN is like that - and it may be one controller can take over for the loss of the other in the HP SAN (again, my NetApp SAN is like that), but all you've done is taken the external switch moved it inside the shelf.  The switch still exists, you just don't see it.  I expect that has a lot to do with how the HP SAN is configured.  My design works regardless.  You just have to imagine two SANS in one shelf.

" BTW, in the storage industry you don't connect the two switches together on the back end as no traffic goes over that link."
The crossover between switches ensures there's no single point of failure.  You can disconnect any cable, port, NIC, host or switch and the system should continue to run.  The SANs themselves are probably single points of failure, but good backups can be restored to the remaining SAN if a VM is mission critical.

Rick_Penney

ASKER
Hi Guys, I've learnt a lot today, so thank you.

As we have three sites to do, and money is tight, we will first retry the attached method (sorry Paul), this is the way that we originally tried, but it may have possibly have been the MPIO that was screwing things up which is sorted now.

I'll redo everything tomorrow, once again, thank you both for all your shared knowledge.
regards
Rick
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
Paul MacDonald

You'll be better off no matter what you do, so good luck!


andyalder

Paul "...host 1 cannot get to controller 1".
That is true so the LUN gets presented to it via controller 2 instead.

Do you think anyone would buy the SAS host-attached variant if what you say was true? That's almost always directly attached since SAS switches are few and far between.
Rick_Penney

ASKER
Thanks both, much appreciated. Not too bad to do once you have the info, its a shame there's no definitive guide out there even from HP when you purchase the SAN.
regards
Rick
Your help has saved me hundreds of hours of internet surfing.
fblack61
andyalder

Yup, documentation's not much use, it's made by Dot Hill - now Seagate. Check out their lack of documentation.
https://www.seagate.com/gb/en/support/dothill-san/general/

Did you get it to work switchless and if so did you prove redundancy by pulling cables out?
Rick_Penney

ASKER
Hi Andy, yes we are now switchless, Each server has two 4 port network interface cards.
We have a connection from one of the interface cards to controller A, and a connection from the other interface card to the controller B
This is repeated on the second node.

Tests:
We have stopped the service within the Failover Cluster manager on the current host server, and the other server kicked in straight away so all looks good from that point of view. We then restarted the service.

We then unplugged a nic from the current host server and the storage stayed put on the server.
We then unplugged the second nic from the same server and everything showed as offline within the failover cluster software. I would have thought that pulling both cables from the same server was the same as turning the server off, however if we turn the server off the other one does take over.

Looking at other training videos, they do list a further network, separate to Production and ISCI for the two nodes to talk to each other for cluster comms. Not sure if thats necessary.

Anyways, things are looking more promising now
regards
andyalder

The additional network was for the heartbeat in older versions, MS got rid of that a while ago.
https://techcommunity.microsoft.com/t5/failover-clustering/no-such-thing-as-a-heartbeat-network/ba-p/388121

I think if you use a LUN on the same storage as quorum witness that may get around the two NICs unplugged problem. Main thing to make sure of is that the cables are plugged into separate NICs so one NIC doesn't become a SPOF.
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.