Link to home
Start Free TrialLog in
Avatar of sunhux
sunhux

asked on

reset of a VM or vMotion to another ESX host causes the VM to disappear from NLB

I'm running ESXi 5 with VM web servers (IIS) running in the VMs.

One day, for testing purpose,
a) I select one VM & 'reset' it (I guess this reset is equivalent to
    forceful powerdown of a physical server)
b) Then from vCenter, I 'migrate' ( = vMotion?) this VM using
     vCenter to another ESXi host (in the same cluster) but this
     VM failed to join back the NLB group (refer to attachment 1)
NLBscr.png
Avatar of sunhux
sunhux

ASKER

I'm on ESXi Ver 5 SP1.

Only two vSwitches are there vSwitch0 & vSwitch 1 :


vSwitch0

Function
 ESXi management, vMotion Network
 
Number of Physical NIC Ports
 2 x 1Gbps
 
Bandwidth
 2 x 1Gbps
 
Uplink Requirements
 ESXi management & vMotion will be connected to two trunk ports on switch (one trunk port per switch).
 
External Network Access
 ESXi management network will be connected to firewall for administration and shared services (e.g. DNS, NTP, etc) access.
 

 

vSwitch0 Configuration Settings
=========================
Parameter
 Setting
 
Load balancing
 Route based on the originating virtual port ID
 
Failover detection
 Link Status Only
 
Notify switches
 Enabled
 
Fail back
 Yes
 
Failover order
 vmnic0
 Management (Active)
vMotion (Standby)
 
Vmnic2
 vMotion (Active)

Management (Standby)
 

 

vSwitch1
==========
Function
 Production VM data network
 
Number of Physical NIC Ports
 4 x 1Gbps
 
Bandwidth
 4 x 1Gbps
 
Uplink Requirements
 VM’s are from multiple VLANs. To carry multiple VLAN traffic, eight trunk ports on switch modules are required (four trunk ports per switch).
 
External Network Access
 All Production VM networks will be connected to production network.
 

 

vSwitch1 Configuration Settings

Parameter
 Setting
 
Load balancing
 Route based on the originating virtual port ID
 
Failover detection
 Link Status Only
 
Notify switches
 Enabled
 
Fail back
 Yes
 
Failover order
 Vmnic4, vmnic5, vmnic8, vmnic9 are Active
SOLUTION
Avatar of Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of sunhux

ASKER

Yes, we set up the NLB as multicast.  I was told by the network engr
our switches is already multicast but he doesn't sound like he's sure
Avatar of sunhux

ASKER

Can give an example sample commands/settings to be entered
on the Cisco switch(es) to enable 'multicast' to resolve this issue.

We're on Layer 3  Cisco WS-3750X-48T-S stacked switches.
Avatar of sunhux

ASKER

If I vMotion back this VM to its original (the previous) ESXi
host, will it temporarily become available again in the NLB?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of sunhux

ASKER

Yes, currently the NLB converges.

Bear with me a bit more, as I'm still newbie to this
& even after reading the KB.

> arp [ip] [cluster multicast mac] ARPA
What's the ip above, is it the ip address for a specific VM or
the Management LAN IP addr of the ESX host?
How do I obtain the multicast mac address?

> arp 192.168.1.100 03bf.c0a8.0164 ARPA
The above IP addr & MAC addr belongs to ?
IP Address of the Cluster - the Cluster IP

Multicast MAC Address is generated and shown to you when the cluster is created.
Avatar of sunhux

ASKER

By "Cluster IP", this is the NLB virtual IP address, right?

If at the point the NLB cluster was created, I (or rather a
vendor who had since left) did not note down the MAC
addr, how can I go in there to obtain this MAC addr?
From nlbmgr?  If it's from nlbmgr, provide me the steps
to go about it as I'm not a Microsoft nor NLB person.
Avatar of sunhux

ASKER

Refer to attached NLB manager screen: is this where I get the
MAC & IP address of the NLB clusters?

Also refer to the next attachment after that: whenever I launch
nlbmgr, it pops up this 'unicast' message.  Can I safely ignore this?
NLBClusterMACIPaddr.png
NLBUnicastMsg.png
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of sunhux

ASKER

>EVERY Cisco switch that carries potentially any requested Cluster trafic is
>configured correctly with Static ARPs, which also includes any trunked uplinks

All my 5 ESXi hosts are physically (ie via LAN cables) connected to two stacked
WS-C3750X-48T-S switches.

To facilitate illustration, let's call them Switches A & B respectively (in
a way, they're access switches)

There's a pair of Cisco 6500 server farm switches (not connected via VSS,
Virtual Switching System), let's call them switches C & D.

Switch A has 4 trunked fiber cables to Switch C.
Switch B has 4 trunked fiber cables to Switch D.

Then we have core switches E & F (not sure what model as it's beyond
my visibility.  

I believe Switch C trunked to Switch E & Switch D trunked to Switch F.

So these static ARPs need to be set in A, B, C, D, E and F?


One side question that's missed:
refer to the last attachment,
"whenever I launch nlbmgr, it pops up this 'unicast' message.
 Can I safely ignore this? "
Yes, A,B,C,D,E and F, otherwise you will find, clients that attempt to connect to cluster will fail...

how have you got networking setup?
Avatar of sunhux

ASKER

> how have you got networking setup?
Not quite sure what the above means.  The network belongs to my
customer & I have no access to their switches.  I'm not VMware
trained but have a couple of VMware engineers who help this
customer set up the ESXi & VMs.  However, our VMware engineers
are not network-trained.  So I'm here asking around
Avatar of sunhux

ASKER

Wow, to set those static ARP entries in all the 6 switches is
quite a major change.  My customer's network engineers
have quite a concern.
Avatar of sunhux

ASKER

Btw, this same issue that you've seen: did it surface when you
vMotion  (ie 'migrate' option from vCenter) a VM to another
ESXi host or under what circumstance it surfaced?
Yes, software NLB with Multicast, can be troublesome to setup correctly. (and static arps are often always forgotten!)

When NLB Multicast Static Arps ARE NOT configured is when the issue occurs.
Avatar of sunhux

ASKER

Q5:
Our customer's network engineers ask me:
Are there any risks (security or in terms of
network load) that setting multicast static
ARP entries will pose?

Will it generate more traffic on the network
or will there be MAC/ARP address spoofing
vulnerabilities?

Q6:
If we just define the static ARP entries in
the access switches which the ESXi hosts
are directly connected to, wouldn't this be
sufficient or we really need to define on
every single switches (access, distribution,
server farm, core) which means for every
single NLB, 8 switches are involved?

Q7:
One side question posed earlier; whenever
I launch 'nlbmgr', a message would pop up.
Can I safely ignore this message:
"Running NLB Manager on a system with all networks
 bound to NLB mig not work as expected.
 If all interfaces are set to run NLB in 'unicast' mode,
 NLB Manager will fail to connect to hosts.
 See 'Help & Support Center' for unicast
 communicatn limitations.
                                                             OK"
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of sunhux

ASKER

So Zen LB is a software LB, playing the role of
F5 LB (or we call it LTM)?  Sounds good, no need
to define static ARP entries on switches..

What about Zen LB support?  It's from the user
community?
It's a very good Load Balancer, that's hosted in a virtual appliance, not really any different to the hardware load balancers, based on embedded BSD or Linux.

As for support, no really any different to most OS support. e.g. Microsoft, do you have a Microsoft Support contract?

You can obtain Professional Support Plans for it, if that concerns you.

Take it for a test drive!

You can also build Cluster NLBs/appliances which work really well as well!
Avatar of sunhux

ASKER

My customer's network & change management reverted to me:

a) the access switches have uplinks to their server farm switches
    & their server farm switches then connects to a pair of HA
    firewall (called 'server farm firewalls') which do the layer 3
    routing.  Then the firewalls connects to core switches; so
    there's a pair of firewalls between the core & server farm
    switches

b) the customer's network support proposed that:
    instead of configuring static ARP entries, they prefer to
    configure static mac address at the server farm switches
    and the access switches. What's the difference between
    the two (ie betw setting MAC addrs vs ARP entries)?
    Can give some examples/commands/urls?  The customer
    does not wish to elaborate further but asked me to find
    out
for NLB to work, commands configurations were listed above with VMware KB..