We help IT Professionals succeed at work.

Configuration of Server SAN with Failover clustering.

Hi, we are looking at some further guidance on setting up failover clustering on two Windows Servers 2019 connecting to a HP MSA2052 SAN.

•      The servers are DL380 Gen10 with 8 NIC ports.
•      The current wiring connections are shown below (we will redo this for redundancy on the controllers)
•      The MSA 2052 has two controllers, each controller has two connections on different subnets connecting to the two servers via one switch with two vlans. (Layer 2 switch)
•      The servers connect to a different production switch for normal data traffic.
•      The iScsi initiators and MPIO have been configured and both servers can see the storage volumes.
•      The Servers can both ping all four ip addresses on the 172.16.*.* subnets, the nics on the 172.16.*.* subnets, don’t have a default gateway configured as per guides we have been reading. Jumbo frames are set up on the iSCSI NICS and switch.
•      The servers are pingable via their full domain name from other devices on the 10.226.*.* production network. The servers are in the correct Server OU.
•      The servers currently don’t have a network setup for a heartbeat and migration.
•      The servers also have Teamed NICs setup for a virtual switch within HyperV manager

The failover clustering invalidate configuration wizard test is failing on:
At least two independent paths to storage target are recommended for each Test Disk
Test Disk 0 from node Server1 (shows FQDN) has 1 usable path to storage target
Test Disk 0 from node Server2 (shows FQDN) has 1 usable path to storage target
At least two independent paths to storage target are recommended for each Test Disk
Test Disk 1 from node Server1 (shows FQDN) has 1 usable path to storage target
Test Disk 1 from node Server2 (shows FQDN) has 1 usable path to storage target

However, we have created a cluster and everything comes up green.
Questions:
The first host that we created a cluster on prompted for an IP Address, we have currently assigned an IP Address from our Production subnet, but this may be wrong and should be on a separate VLAN, please advise.
The second host when entering the name of the cluster also looks good and we can see everything as expected.
Do we need to create a further separate subnet for a heartbeat and if so how is it configured within the software?

After this all correct, do I assume that the VM Guests are configured in HyperV manager and that the location of the Virtual disks points back to the IP address of the Cluster?

What is configured in the Hyper V Manager HyperV SAN Manager?

If possible, an explanation would be better than a URL link as we have already pretty much exhausted most guides found from internet searches. Sorry, i know its a lot of questions.Server San wiring diagramRegards
Rick
Comment
Watch Question

Paul MacDonaldDirector, Information Systems
BRONZE EXPERT
Commented:

The design is fine but, unless the switches are being used for other traffic, the VLANs are probably overkill.  Not harmful, but unnecessary.  Also, your design has a single point of failure in that switch, so bear that in mind.  You may wan to include a second switch in your design, or have a spare on hand in case that one dies.

"The first host that we created a cluster on prompted for an IP Address, we have currently assigned an IP Address from our Production subnet, but this may be wrong and should be on a separate VLAN, please advise."
Your cluster hosts will each get at least one IP address so you can reach them from your network.  The cluster itself will also get at least one IP address.


"Do we need to create a further separate subnet for a heartbeat and if so how is it configured within the software?"
There was a time when Microsoft Clusters required a separate heartbeat interface.  This is no longer the case.

"After this all correct, do I assume that the VM Guests are configured in HyperV manager and that the location of the Virtual disks points back to the IP address of the Cluster?"
Yes.  You'll configure the VMs in the Hyper-V Manager.   You'll then add the VMs you want clustered to the Failover Cluster Manager.

"What is configured in the Hyper V Manager HyperV SAN Manager?"
They Hyper-V SAN Manager lets you create a virtual SAN for your virtual machines, and is not applicable to your design.

BRONZE EXPERT
Distinguished Expert 2019

Commented:
•      The MSA 2052 has two controllers, each controller has two connections on different subnets connecting to the two servers via one switch with two vlans. (Layer 2 switch)

So if that switch fails it all crashes? If you only have two hosts then throw the switch away and connect point-to-point.
Paul MacDonaldDirector, Information Systems
BRONZE EXPERT

Commented:

The switch gives the network connections on the SAN some redundancy, so it does help, but I also noted that the switch is a single point of failure.

Author

Commented:
Thanks Paul for your Post, its very much appreciated.

With ref to my statement of:
"The first host that we created a cluster on prompted for an IP Address, we have currently assigned an IP Address from our Production subnet, but this may be wrong and should be on a separate VLAN, please advise."


I didn't really explain. The two physical hosts have static ip addresses on the NICs that connect back to the production network.
What i meant was that when we clicked on create a cluster it asked for a network address to use, which we assigned an IP from the same production network subnet.
There have been so many conflicting guides ref this topic with none giving all the info, most guides mention setting up a separate migration network so i wasnt sure if the cluster ip should indeed be on the production network or separate.

If you dont mind i'll keep this post open for a day in case we have anymore queries with adding the VM's to the Failover Cluster Manager after adding them to the HyperV manager, but good to know we are almost there with it.
Kindest regards
Rick

Author

Commented:
Thanks Guys, i think when we were point to point we were in a world of pain after reading so many different ways of doing it and the two servers although could ping each other on the production LAN (10.226.50.*) couldn't ping each other via their 172.16 addresses and the failover customer validation wizard test failure led us to believe that it couldnt be done this way.
Would it be possible to roughly draw a layout for the isci connections please?
regards
BRONZE EXPERT
Distinguished Expert 2019

Commented:
Four cables instead of the switch gives full redundancy, saves a little bit of latency and heat.
Paul MacDonaldDirector, Information Systems
BRONZE EXPERT

Commented:

"i wasnt sure if the cluster ip should indeed be on the production network or separate"
The cluster IP is for talking to the cluster and would almost certainly be from the production network.  The cluster IP won't matter much to you in THIS scenario because you generally won't talk to the cluster.  

But imagine if you had two file servers (or web servers, or mail servers) in a cluster - you'd point people to the cluster IP because you simply want to talk to whoever owns the cluster - you don't care which specific server it is.

Author

Commented:
Ah, understood, thank you.
i'll give it all a go this afternoon/tomorrow morning.

Thanks Guys
Paul MacDonaldDirector, Information Systems
BRONZE EXPERT

Commented:

To illustrate what I'm talking about, here's an image I knocked together.  I included (what I understand to be) Andy's suggestion, and why I disagree with it.  No switches, one switch, or more switches, you'll get some level of redundancy and fault tolerance no matter what you do.  We went virtual several years ago (my implementation has three of everything) and it's one of the best decisions I've ever made.

BRONZE EXPERT
Distinguished Expert 2019
Commented:
Using the above schematic from Paul on the right (except they should be labeled controller 1 & 2 rather than SAN 1 & 2) you can see that if controller 1 fails then both hosts will get I/O via controller 2 instead. If one cable fails there is still another cable to one controller or another so it is fully redundant.

It's a fully supported configuration and is more redundant than any switch configuration.

N.B. The NICs may be a SPOF in both examples depending if all ports are on a single card but there again the motherboard is an inherrent SPOF.

From the best practices guide:

If a controller fails over, the surviving controller reports that it is now the preferred path for all disk groups. When the failed controller is back online, the disk groups and preferred paths switch back to the original owning controller.
The best practice is to map volumes to two ports on each controller to take advantage of load balancing and redundancy.
Mapping a port creates a mapping to each controller. For example, mapping Port 1 maps host Ports A1 and B1 as well. Mapping Port 2 maps host Ports A2 and B2.
With this in mind, make sure that physical connections are set up correctly on the MSA so that a server has a connection to both controllers on the same port number. For example, on a direct-attached MSA 1050/2050/2052 configuration with multiple servers, make sure that Ports A1 and B1 are connected to Server 1, Ports A2 and B2 are connected to Server 2, and so on.
HPE does not recommend enabling more than eight paths to a single host—that is, two HBA ports on a physical server connected to two ports on the A controller and two ports on the B controller. Enabling more paths from a host to a volume puts additional stress on the operating system’s multipath software, which can lead to delayed path recovery in very large configurations.
Note

Author

Commented:
Nice one, thank you, just so i'm 100% clear, you said i dont need to worry about vlans, but can all these connections be on 172.16.0.0/24 or am i still using two subnets 172.16.0.0/24 and 172.16.1.0/24, to me it looks they can all be on the same?
BRONZE EXPERT
Distinguished Expert 2019

Commented:
Are you talking about the front end that Paul knows about or the storage connections that I know about?
For the storage if you direct attach then each cable is a separate subnet.

BTW, in the storage industry you don't connect the two switches together on the back end as no traffic goes over that link.
Paul MacDonaldDirector, Information Systems
BRONZE EXPERT

Commented:

"...can all these connections be on 172.16.0.0/24 or am i still using two subnets 172.16.0.0/24 and 172.16.1.0/24, to me it looks they can all be on the same? "
I use a flat, class "C" for my SAN connection.  There's no harm in using VLANs, but they're not necessary.  The switch(es) will segregate the traffic, and it's only the two SANs and two hosts on that network.


Andy:
"... you can see that if controller 1 fails then both hosts will get I/O via controller 2 instead. "
Disconnect the cable from host 1 to controller 1, or lose that NIC in host 1, or lose that NIC in controller 1 and now host 1 cannot get to controller 1.  The addition of one or more switches eliminates that problem.  It may be the HP SAN has an internal switch that connects the two controllers - my NetApp SAN is like that - and it may be one controller can take over for the loss of the other in the HP SAN (again, my NetApp SAN is like that), but all you've done is taken the external switch moved it inside the shelf.  The switch still exists, you just don't see it.  I expect that has a lot to do with how the HP SAN is configured.  My design works regardless.  You just have to imagine two SANS in one shelf.

" BTW, in the storage industry you don't connect the two switches together on the back end as no traffic goes over that link."
The crossover between switches ensures there's no single point of failure.  You can disconnect any cable, port, NIC, host or switch and the system should continue to run.  The SANs themselves are probably single points of failure, but good backups can be restored to the remaining SAN if a VM is mission critical.

Author

Commented:
Hi Guys, I've learnt a lot today, so thank you.

As we have three sites to do, and money is tight, we will first retry the attached method (sorry Paul), this is the way that we originally tried, but it may have possibly have been the MPIO that was screwing things up which is sorted now.

I'll redo everything tomorrow, once again, thank you both for all your shared knowledge.
regards
Rick
Paul MacDonaldDirector, Information Systems
BRONZE EXPERT

Commented:

You'll be better off no matter what you do, so good luck!


BRONZE EXPERT
Distinguished Expert 2019

Commented:
Paul "...host 1 cannot get to controller 1".
That is true so the LUN gets presented to it via controller 2 instead.

Do you think anyone would buy the SAS host-attached variant if what you say was true? That's almost always directly attached since SAS switches are few and far between.

Author

Commented:
Thanks both, much appreciated. Not too bad to do once you have the info, its a shame there's no definitive guide out there even from HP when you purchase the SAN.
regards
Rick
BRONZE EXPERT
Distinguished Expert 2019

Commented:
Yup, documentation's not much use, it's made by Dot Hill - now Seagate. Check out their lack of documentation.
https://www.seagate.com/gb/en/support/dothill-san/general/

Did you get it to work switchless and if so did you prove redundancy by pulling cables out?

Author

Commented:
Hi Andy, yes we are now switchless, Each server has two 4 port network interface cards.
We have a connection from one of the interface cards to controller A, and a connection from the other interface card to the controller B
This is repeated on the second node.

Tests:
We have stopped the service within the Failover Cluster manager on the current host server, and the other server kicked in straight away so all looks good from that point of view. We then restarted the service.

We then unplugged a nic from the current host server and the storage stayed put on the server.
We then unplugged the second nic from the same server and everything showed as offline within the failover cluster software. I would have thought that pulling both cables from the same server was the same as turning the server off, however if we turn the server off the other one does take over.

Looking at other training videos, they do list a further network, separate to Production and ISCI for the two nodes to talk to each other for cluster comms. Not sure if thats necessary.

Anyways, things are looking more promising now
regards
BRONZE EXPERT
Distinguished Expert 2019

Commented:
The additional network was for the heartbeat in older versions, MS got rid of that a while ago.
https://techcommunity.microsoft.com/t5/failover-clustering/no-such-thing-as-a-heartbeat-network/ba-p/388121

I think if you use a LUN on the same storage as quorum witness that may get around the two NICs unplugged problem. Main thing to make sure of is that the cables are plugged into separate NICs so one NIC doesn't become a SPOF.