sunhux
asked on
reset of a VM or vMotion to another ESX host causes the VM to disappear from NLB
I'm running ESXi 5 with VM web servers (IIS) running in the VMs.
One day, for testing purpose,
a) I select one VM & 'reset' it (I guess this reset is equivalent to
forceful powerdown of a physical server)
b) Then from vCenter, I 'migrate' ( = vMotion?) this VM using
vCenter to another ESXi host (in the same cluster) but this
VM failed to join back the NLB group (refer to attachment 1)
NLBscr.png
One day, for testing purpose,
a) I select one VM & 'reset' it (I guess this reset is equivalent to
forceful powerdown of a physical server)
b) Then from vCenter, I 'migrate' ( = vMotion?) this VM using
vCenter to another ESXi host (in the same cluster) but this
VM failed to join back the NLB group (refer to attachment 1)
NLBscr.png
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Yes, we set up the NLB as multicast. I was told by the network engr
our switches is already multicast but he doesn't sound like he's sure
our switches is already multicast but he doesn't sound like he's sure
ASKER
Can give an example sample commands/settings to be entered
on the Cisco switch(es) to enable 'multicast' to resolve this issue.
We're on Layer 3 Cisco WS-3750X-48T-S stacked switches.
on the Cisco switch(es) to enable 'multicast' to resolve this issue.
We're on Layer 3 Cisco WS-3750X-48T-S stacked switches.
ASKER
If I vMotion back this VM to its original (the previous) ESXi
host, will it temporarily become available again in the NLB?
host, will it temporarily become available again in the NLB?
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Yes, currently the NLB converges.
Bear with me a bit more, as I'm still newbie to this
& even after reading the KB.
> arp [ip] [cluster multicast mac] ARPA
What's the ip above, is it the ip address for a specific VM or
the Management LAN IP addr of the ESX host?
How do I obtain the multicast mac address?
> arp 192.168.1.100 03bf.c0a8.0164 ARPA
The above IP addr & MAC addr belongs to ?
Bear with me a bit more, as I'm still newbie to this
& even after reading the KB.
> arp [ip] [cluster multicast mac] ARPA
What's the ip above, is it the ip address for a specific VM or
the Management LAN IP addr of the ESX host?
How do I obtain the multicast mac address?
> arp 192.168.1.100 03bf.c0a8.0164 ARPA
The above IP addr & MAC addr belongs to ?
IP Address of the Cluster - the Cluster IP
Multicast MAC Address is generated and shown to you when the cluster is created.
Multicast MAC Address is generated and shown to you when the cluster is created.
ASKER
By "Cluster IP", this is the NLB virtual IP address, right?
If at the point the NLB cluster was created, I (or rather a
vendor who had since left) did not note down the MAC
addr, how can I go in there to obtain this MAC addr?
From nlbmgr? If it's from nlbmgr, provide me the steps
to go about it as I'm not a Microsoft nor NLB person.
If at the point the NLB cluster was created, I (or rather a
vendor who had since left) did not note down the MAC
addr, how can I go in there to obtain this MAC addr?
From nlbmgr? If it's from nlbmgr, provide me the steps
to go about it as I'm not a Microsoft nor NLB person.
ASKER
Refer to attached NLB manager screen: is this where I get the
MAC & IP address of the NLB clusters?
Also refer to the next attachment after that: whenever I launch
nlbmgr, it pops up this 'unicast' message. Can I safely ignore this?
NLBClusterMACIPaddr.png
NLBUnicastMsg.png
MAC & IP address of the NLB clusters?
Also refer to the next attachment after that: whenever I launch
nlbmgr, it pops up this 'unicast' message. Can I safely ignore this?
NLBClusterMACIPaddr.png
NLBUnicastMsg.png
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
>EVERY Cisco switch that carries potentially any requested Cluster trafic is
>configured correctly with Static ARPs, which also includes any trunked uplinks
All my 5 ESXi hosts are physically (ie via LAN cables) connected to two stacked
WS-C3750X-48T-S switches.
To facilitate illustration, let's call them Switches A & B respectively (in
a way, they're access switches)
There's a pair of Cisco 6500 server farm switches (not connected via VSS,
Virtual Switching System), let's call them switches C & D.
Switch A has 4 trunked fiber cables to Switch C.
Switch B has 4 trunked fiber cables to Switch D.
Then we have core switches E & F (not sure what model as it's beyond
my visibility.
I believe Switch C trunked to Switch E & Switch D trunked to Switch F.
So these static ARPs need to be set in A, B, C, D, E and F?
One side question that's missed:
refer to the last attachment,
"whenever I launch nlbmgr, it pops up this 'unicast' message.
Can I safely ignore this? "
>configured correctly with Static ARPs, which also includes any trunked uplinks
All my 5 ESXi hosts are physically (ie via LAN cables) connected to two stacked
WS-C3750X-48T-S switches.
To facilitate illustration, let's call them Switches A & B respectively (in
a way, they're access switches)
There's a pair of Cisco 6500 server farm switches (not connected via VSS,
Virtual Switching System), let's call them switches C & D.
Switch A has 4 trunked fiber cables to Switch C.
Switch B has 4 trunked fiber cables to Switch D.
Then we have core switches E & F (not sure what model as it's beyond
my visibility.
I believe Switch C trunked to Switch E & Switch D trunked to Switch F.
So these static ARPs need to be set in A, B, C, D, E and F?
One side question that's missed:
refer to the last attachment,
"whenever I launch nlbmgr, it pops up this 'unicast' message.
Can I safely ignore this? "
Yes, A,B,C,D,E and F, otherwise you will find, clients that attempt to connect to cluster will fail...
how have you got networking setup?
how have you got networking setup?
ASKER
> how have you got networking setup?
Not quite sure what the above means. The network belongs to my
customer & I have no access to their switches. I'm not VMware
trained but have a couple of VMware engineers who help this
customer set up the ESXi & VMs. However, our VMware engineers
are not network-trained. So I'm here asking around
Not quite sure what the above means. The network belongs to my
customer & I have no access to their switches. I'm not VMware
trained but have a couple of VMware engineers who help this
customer set up the ESXi & VMs. However, our VMware engineers
are not network-trained. So I'm here asking around
ASKER
Wow, to set those static ARP entries in all the 6 switches is
quite a major change. My customer's network engineers
have quite a concern.
quite a major change. My customer's network engineers
have quite a concern.
ASKER
Btw, this same issue that you've seen: did it surface when you
vMotion (ie 'migrate' option from vCenter) a VM to another
ESXi host or under what circumstance it surfaced?
vMotion (ie 'migrate' option from vCenter) a VM to another
ESXi host or under what circumstance it surfaced?
Yes, software NLB with Multicast, can be troublesome to setup correctly. (and static arps are often always forgotten!)
When NLB Multicast Static Arps ARE NOT configured is when the issue occurs.
When NLB Multicast Static Arps ARE NOT configured is when the issue occurs.
ASKER
Q5:
Our customer's network engineers ask me:
Are there any risks (security or in terms of
network load) that setting multicast static
ARP entries will pose?
Will it generate more traffic on the network
or will there be MAC/ARP address spoofing
vulnerabilities?
Q6:
If we just define the static ARP entries in
the access switches which the ESXi hosts
are directly connected to, wouldn't this be
sufficient or we really need to define on
every single switches (access, distribution,
server farm, core) which means for every
single NLB, 8 switches are involved?
Q7:
One side question posed earlier; whenever
I launch 'nlbmgr', a message would pop up.
Can I safely ignore this message:
"Running NLB Manager on a system with all networks
bound to NLB mig not work as expected.
If all interfaces are set to run NLB in 'unicast' mode,
NLB Manager will fail to connect to hosts.
See 'Help & Support Center' for unicast
communicatn limitations.
OK"
Our customer's network engineers ask me:
Are there any risks (security or in terms of
network load) that setting multicast static
ARP entries will pose?
Will it generate more traffic on the network
or will there be MAC/ARP address spoofing
vulnerabilities?
Q6:
If we just define the static ARP entries in
the access switches which the ESXi hosts
are directly connected to, wouldn't this be
sufficient or we really need to define on
every single switches (access, distribution,
server farm, core) which means for every
single NLB, 8 switches are involved?
Q7:
One side question posed earlier; whenever
I launch 'nlbmgr', a message would pop up.
Can I safely ignore this message:
"Running NLB Manager on a system with all networks
bound to NLB mig not work as expected.
If all interfaces are set to run NLB in 'unicast' mode,
NLB Manager will fail to connect to hosts.
See 'Help & Support Center' for unicast
communicatn limitations.
OK"
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
So Zen LB is a software LB, playing the role of
F5 LB (or we call it LTM)? Sounds good, no need
to define static ARP entries on switches..
What about Zen LB support? It's from the user
community?
F5 LB (or we call it LTM)? Sounds good, no need
to define static ARP entries on switches..
What about Zen LB support? It's from the user
community?
It's a very good Load Balancer, that's hosted in a virtual appliance, not really any different to the hardware load balancers, based on embedded BSD or Linux.
As for support, no really any different to most OS support. e.g. Microsoft, do you have a Microsoft Support contract?
You can obtain Professional Support Plans for it, if that concerns you.
Take it for a test drive!
You can also build Cluster NLBs/appliances which work really well as well!
As for support, no really any different to most OS support. e.g. Microsoft, do you have a Microsoft Support contract?
You can obtain Professional Support Plans for it, if that concerns you.
Take it for a test drive!
You can also build Cluster NLBs/appliances which work really well as well!
ASKER
My customer's network & change management reverted to me:
a) the access switches have uplinks to their server farm switches
& their server farm switches then connects to a pair of HA
firewall (called 'server farm firewalls') which do the layer 3
routing. Then the firewalls connects to core switches; so
there's a pair of firewalls between the core & server farm
switches
b) the customer's network support proposed that:
instead of configuring static ARP entries, they prefer to
configure static mac address at the server farm switches
and the access switches. What's the difference between
the two (ie betw setting MAC addrs vs ARP entries)?
Can give some examples/commands/urls? The customer
does not wish to elaborate further but asked me to find
out
a) the access switches have uplinks to their server farm switches
& their server farm switches then connects to a pair of HA
firewall (called 'server farm firewalls') which do the layer 3
routing. Then the firewalls connects to core switches; so
there's a pair of firewalls between the core & server farm
switches
b) the customer's network support proposed that:
instead of configuring static ARP entries, they prefer to
configure static mac address at the server farm switches
and the access switches. What's the difference between
the two (ie betw setting MAC addrs vs ARP entries)?
Can give some examples/commands/urls? The customer
does not wish to elaborate further but asked me to find
out
for NLB to work, commands configurations were listed above with VMware KB..
ASKER
Only two vSwitches are there vSwitch0 & vSwitch 1 :
vSwitch0
Function
ESXi management, vMotion Network
Number of Physical NIC Ports
2 x 1Gbps
Bandwidth
2 x 1Gbps
Uplink Requirements
ESXi management & vMotion will be connected to two trunk ports on switch (one trunk port per switch).
External Network Access
ESXi management network will be connected to firewall for administration and shared services (e.g. DNS, NTP, etc) access.
vSwitch0 Configuration Settings
=========================
Parameter
Setting
Load balancing
Route based on the originating virtual port ID
Failover detection
Link Status Only
Notify switches
Enabled
Fail back
Yes
Failover order
vmnic0
Management (Active)
vMotion (Standby)
Vmnic2
vMotion (Active)
Management (Standby)
vSwitch1
==========
Function
Production VM data network
Number of Physical NIC Ports
4 x 1Gbps
Bandwidth
4 x 1Gbps
Uplink Requirements
VM’s are from multiple VLANs. To carry multiple VLAN traffic, eight trunk ports on switch modules are required (four trunk ports per switch).
External Network Access
All Production VM networks will be connected to production network.
vSwitch1 Configuration Settings
Parameter
Setting
Load balancing
Route based on the originating virtual port ID
Failover detection
Link Status Only
Notify switches
Enabled
Fail back
Yes
Failover order
Vmnic4, vmnic5, vmnic8, vmnic9 are Active