[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 2185
  • Last Modified:

reset of a VM or vMotion to another ESX host causes the VM to disappear from NLB

I'm running ESXi 5 with VM web servers (IIS) running in the VMs.

One day, for testing purpose,
a) I select one VM & 'reset' it (I guess this reset is equivalent to
    forceful powerdown of a physical server)
b) Then from vCenter, I 'migrate' ( = vMotion?) this VM using
     vCenter to another ESXi host (in the same cluster) but this
     VM failed to join back the NLB group (refer to attachment 1)
NLBscr.png
0
sunhux
Asked:
sunhux
  • 14
  • 10
5 Solutions
 
sunhuxAuthor Commented:
I'm on ESXi Ver 5 SP1.

Only two vSwitches are there vSwitch0 & vSwitch 1 :


vSwitch0

Function
 ESXi management, vMotion Network
 
Number of Physical NIC Ports
 2 x 1Gbps
 
Bandwidth
 2 x 1Gbps
 
Uplink Requirements
 ESXi management & vMotion will be connected to two trunk ports on switch (one trunk port per switch).
 
External Network Access
 ESXi management network will be connected to firewall for administration and shared services (e.g. DNS, NTP, etc) access.
 

 

vSwitch0 Configuration Settings
=========================
Parameter
 Setting
 
Load balancing
 Route based on the originating virtual port ID
 
Failover detection
 Link Status Only
 
Notify switches
 Enabled
 
Fail back
 Yes
 
Failover order
 vmnic0
 Management (Active)
vMotion (Standby)
 
Vmnic2
 vMotion (Active)

Management (Standby)
 

 

vSwitch1
==========
Function
 Production VM data network
 
Number of Physical NIC Ports
 4 x 1Gbps
 
Bandwidth
 4 x 1Gbps
 
Uplink Requirements
 VM’s are from multiple VLANs. To carry multiple VLAN traffic, eight trunk ports on switch modules are required (four trunk ports per switch).
 
External Network Access
 All Production VM networks will be connected to production network.
 

 

vSwitch1 Configuration Settings

Parameter
 Setting
 
Load balancing
 Route based on the originating virtual port ID
 
Failover detection
 Link Status Only
 
Notify switches
 Enabled
 
Fail back
 Yes
 
Failover order
 Vmnic4, vmnic5, vmnic8, vmnic9 are Active
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Yes, I've seen this issue, if you have NOT Added Static ARP Entries to your physical switches.

Are you using Multicast NLB?
0
 
sunhuxAuthor Commented:
Yes, we set up the NLB as multicast.  I was told by the network engr
our switches is already multicast but he doesn't sound like he's sure
0
Windows Server 2016: All you need to know

Learn about Hyper-V features that increase functionality and usability of Microsoft Windows Server 2016. Also, throughout this eBook, you’ll find some basic PowerShell examples that will help you leverage the scripts in your environments!

 
sunhuxAuthor Commented:
Can give an example sample commands/settings to be entered
on the Cisco switch(es) to enable 'multicast' to resolve this issue.

We're on Layer 3  Cisco WS-3750X-48T-S stacked switches.
0
 
sunhuxAuthor Commented:
If I vMotion back this VM to its original (the previous) ESXi
host, will it temporarily become available again in the NLB?
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
if you have networking setup correctly on your physical switches. NLB should converge.

arp [ip] [cluster multicast mac] ARPA
arp 192.168.1.100 03bf.c0a8.0164 ARPA

mac-address-table static [cluster multicast mac] [vlan id] [interface]
mac-address-table static 03bf.c0a8.0164 vlan 1 interface GigabitEthernet1/1 GigabitEthernet1/2
GigabitEthernet1/15 GigabitEthernet1/16

see here

http://kb.vmware.com/kb/1006525
0
 
sunhuxAuthor Commented:
Yes, currently the NLB converges.

Bear with me a bit more, as I'm still newbie to this
& even after reading the KB.

> arp [ip] [cluster multicast mac] ARPA
What's the ip above, is it the ip address for a specific VM or
the Management LAN IP addr of the ESX host?
How do I obtain the multicast mac address?

> arp 192.168.1.100 03bf.c0a8.0164 ARPA
The above IP addr & MAC addr belongs to ?
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
IP Address of the Cluster - the Cluster IP

Multicast MAC Address is generated and shown to you when the cluster is created.
0
 
sunhuxAuthor Commented:
By "Cluster IP", this is the NLB virtual IP address, right?

If at the point the NLB cluster was created, I (or rather a
vendor who had since left) did not note down the MAC
addr, how can I go in there to obtain this MAC addr?
From nlbmgr?  If it's from nlbmgr, provide me the steps
to go about it as I'm not a Microsoft nor NLB person.
0
 
sunhuxAuthor Commented:
Refer to attached NLB manager screen: is this where I get the
MAC & IP address of the NLB clusters?

Also refer to the next attachment after that: whenever I launch
nlbmgr, it pops up this 'unicast' message.  Can I safely ignore this?
NLBClusterMACIPaddr.png
NLBUnicastMsg.png
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
that is correct, that is your Cluster IP and Cluster MAC address.

You need to ensure, that EVERY Cisco switch that carries potentially any requested Cluster trafic is configured correctly with Static ARPs, which also includes any trunked uplinks.
0
 
sunhuxAuthor Commented:
>EVERY Cisco switch that carries potentially any requested Cluster trafic is
>configured correctly with Static ARPs, which also includes any trunked uplinks

All my 5 ESXi hosts are physically (ie via LAN cables) connected to two stacked
WS-C3750X-48T-S switches.

To facilitate illustration, let's call them Switches A & B respectively (in
a way, they're access switches)

There's a pair of Cisco 6500 server farm switches (not connected via VSS,
Virtual Switching System), let's call them switches C & D.

Switch A has 4 trunked fiber cables to Switch C.
Switch B has 4 trunked fiber cables to Switch D.

Then we have core switches E & F (not sure what model as it's beyond
my visibility.  

I believe Switch C trunked to Switch E & Switch D trunked to Switch F.

So these static ARPs need to be set in A, B, C, D, E and F?


One side question that's missed:
refer to the last attachment,
"whenever I launch nlbmgr, it pops up this 'unicast' message.
 Can I safely ignore this? "
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Yes, A,B,C,D,E and F, otherwise you will find, clients that attempt to connect to cluster will fail...

how have you got networking setup?
0
 
sunhuxAuthor Commented:
> how have you got networking setup?
Not quite sure what the above means.  The network belongs to my
customer & I have no access to their switches.  I'm not VMware
trained but have a couple of VMware engineers who help this
customer set up the ESXi & VMs.  However, our VMware engineers
are not network-trained.  So I'm here asking around
0
 
sunhuxAuthor Commented:
Wow, to set those static ARP entries in all the 6 switches is
quite a major change.  My customer's network engineers
have quite a concern.
0
 
sunhuxAuthor Commented:
Btw, this same issue that you've seen: did it surface when you
vMotion  (ie 'migrate' option from vCenter) a VM to another
ESXi host or under what circumstance it surfaced?
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Yes, software NLB with Multicast, can be troublesome to setup correctly. (and static arps are often always forgotten!)

When NLB Multicast Static Arps ARE NOT configured is when the issue occurs.
0
 
sunhuxAuthor Commented:
Q5:
Our customer's network engineers ask me:
Are there any risks (security or in terms of
network load) that setting multicast static
ARP entries will pose?

Will it generate more traffic on the network
or will there be MAC/ARP address spoofing
vulnerabilities?

Q6:
If we just define the static ARP entries in
the access switches which the ESXi hosts
are directly connected to, wouldn't this be
sufficient or we really need to define on
every single switches (access, distribution,
server farm, core) which means for every
single NLB, 8 switches are involved?

Q7:
One side question posed earlier; whenever
I launch 'nlbmgr', a message would pop up.
Can I safely ignore this message:
"Running NLB Manager on a system with all networks
 bound to NLB mig not work as expected.
 If all interfaces are set to run NLB in 'unicast' mode,
 NLB Manager will fail to connect to hosts.
 See 'Help & Support Center' for unicast
 communicatn limitations.
                                                             OK"
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Q5. Multicasts can increase network transmissions, across all ports and switches.

Q6. In our experience, if you want NLB clustering to work across your network, all switches have to be modified.

Q7.I need to have more information, on how you have setup your VMs.
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
if you client is un happy with Windows NLB.

I can suggest the following Virtual Appliance, for VMware ESX/ESXi, it's also free, easy to setup, and very good.

It's called Zen Load Balancer here:-

http://www.zenloadbalancer.com/web/

We use this as an alternative to Microsoft NLB, because fo the cost of licenses, and also when clients do not have switches which support Static ARP, e.g. like Cisco switches.
0
 
sunhuxAuthor Commented:
So Zen LB is a software LB, playing the role of
F5 LB (or we call it LTM)?  Sounds good, no need
to define static ARP entries on switches..

What about Zen LB support?  It's from the user
community?
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
It's a very good Load Balancer, that's hosted in a virtual appliance, not really any different to the hardware load balancers, based on embedded BSD or Linux.

As for support, no really any different to most OS support. e.g. Microsoft, do you have a Microsoft Support contract?

You can obtain Professional Support Plans for it, if that concerns you.

Take it for a test drive!

You can also build Cluster NLBs/appliances which work really well as well!
0
 
sunhuxAuthor Commented:
My customer's network & change management reverted to me:

a) the access switches have uplinks to their server farm switches
    & their server farm switches then connects to a pair of HA
    firewall (called 'server farm firewalls') which do the layer 3
    routing.  Then the firewalls connects to core switches; so
    there's a pair of firewalls between the core & server farm
    switches

b) the customer's network support proposed that:
    instead of configuring static ARP entries, they prefer to
    configure static mac address at the server farm switches
    and the access switches. What's the difference between
    the two (ie betw setting MAC addrs vs ARP entries)?
    Can give some examples/commands/urls?  The customer
    does not wish to elaborate further but asked me to find
    out
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
for NLB to work, commands configurations were listed above with VMware KB..
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 14
  • 10
Tackle projects and never again get stuck behind a technical roadblock.
Join Now