Link to home
Start Free TrialLog in
Avatar of Kissel-B
Kissel-B

asked on

Internal Network becomes partitioned when tring to run VMFleet

I am trying to run VMFleet on a new Storage Spaces Direct Hyper-Converged cluster when I try to fun the create-fleet powershell script an internal hyper-v switch is added to both nodes they stay up for a few seconds then go partitioned if I remove on the other will come up same thing happens if I manually create an internal switch and the fleet VM's never get created any ideas?
Avatar of Philip Elder
Philip Elder
Flag of Canada image

How is the network fabric set up?
Avatar of Kissel-B
Kissel-B

ASKER

Hyper-Converged over 10 gig arista switch.
Each Node a Dual port Mellanox X-5
Followed Standard S2D deployment guide 2 V nics for storage 1 vnic for mgmt everything works fine until I tried to run the VMFleet Below is the network errors when I try to validate cluster with the two internal vnics for VMFleet without them no issue cluster passes validation

 * Network interfaces MIS2DSVR01.MI.local - vEthernet (New Virtual Switch) and S2DSVR02 - vEthernet (New Virtual Switch) are on the same cluster network, yet address 169.16.0.1 is not reachable from 169.16.0.2 using UDP on port 3343.
 * Network interfaces MIS2DSVR02.MI.local - vEthernet (New Virtual Switch) and S2DSVR02 - vEthernet (New Virtual Switch) are on the same cluster network, yet address 169.16.0.2 is not reachable from 169.16.0.1 using UDP on port 3343.
What was the result for Test-RDMA ?

Those are APIPA addresses. Static IP addresses should be assigned to the ports with two subnets if two switches (best practice). We use dual port pNIC pairs in each node so port 0 = subnet 0 and port 1 - subnet 1 on each. Each port on the pNIC is connected to Switch 0 and Switch 1 respectively.

Make sure to limit Live Migration using QoS so that it does not choke storage traffic.

How many nodes?
I set all of the networking RDMA QOS etc up and the base networking of the cluster works fine.  I want to stress test it via VMFLEET.  When you run the PowerShell script to create the VMS for the test it first adds to internal virtual network cards to the cluster so the host can communicate with the 20 or so VMS created for the test.  It is the virtual internal NICs that get partitioned and no matter what I try I can’t get them to come up.  If I remove one of the internal vnics or reboot a node the other nodes internal vnic can communicate all of the sudden but as soon as I add it back or the other node comes back up the internal network fails again and says it partitioned.
Make sure the firewall exception is in place for all three firewall profiles. Make sure firewall logging is enabled. Check for dropped packets on UDP 3343.
This question needs an answer!
Become an EE member today
7 DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform.
View membership options
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.