VMWARE vSphere 4.1 - Network Load Balancing (NLB) Multicast Mode Configuration / Cisco 6500

This is what a quick overview of what we are trying to accomplish and the problem that we are having:

Goal:

We are trying to configure Windows NLB on 2 VMs.

The Windows NLB configuration:

VIP -> 172.20.200.204 - The virtual MAC is 03bf.ac14.c8cc

The NLB cluster is formed by

PSTS01 -> 172.20.200.205
PSTS02 -> 172.20.200.206

VMWARE/Network:

We have 5 ESX servers which are connected to out core switches Cisco 6500 and configured as follows:

!
interface GigabitEthernet9/24
 description VM4 Data
 switchport
 switchport access vlan 200
 switchport mode trunk
 spanning-tree portfast

Issue:

We are not able reach the VIP 172.20.200.204 from any VLAN other than the VLAN 200. The VIP needs to be reachable in order for the cluster to work.

This article explains what we are trying to acomplish:

http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&externalId=1006558


We have followed the steps in the following article to configure the network:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1006525

We have use mac-address-table static with the Uplink interface for each of our ESX servers and their connection to the Core switches. We have used CDP to obtain the information.  


CR01 - Core Router1

arp 172.20.200.204 03bf.ac14.c8cc ARPA

mac-address-table static 03bf.ac14.c8cc vlan 200 interface GigabitEthernet9/32 GigabitEthernet9/24 GigabitEthernet9/16 GigabitEthernet9/21 GigabitEthernet9/19

CR02 - Core Router2

arp 172.20.200.204 03bf.ac14.c8cc ARPA

mac-address-table static 03bf.ac14.c8cc vlan 200 interface GigabitEthernet9/41 GigabitEthernet9/24 GigabitEthernet9/11 GigabitEthernet9/17 GigabitEthernet9/15


As said before we are not able to reach the VIP -> 172.20.200.204  from any VLAN other than VLAN 200 but we are able to reach the Physical IPs assigned to the PSTS01 (172.20.200.205) and PSTS02 (172.20.200.206) from any of our VLANS.

Once the manual ARP resolution of the NLB cluster address is configured in our Core Router 1 and 2 we can see the static entry on the CAM.  
Also we have verified that the following setting is configured as suggested in the KB - Virtual Switch NIC Team Policy > Notify Switches is set to Yes.

One more thing to add is that we can ping the VIP 172.20.200.204 from any of the Cisco 6500 and from any host virtual or physical that is configured at VLAN 200.

Also from any of Cisco 6500 the arp successfully resolves. If we do a show ip arp to the VIP 172.20.200.204 returns:

Internet 172.20.200.204 03bf.ac14.c8cc ARPA








 
llaravaAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Maen Abu-TabanjehNetwork Administrator, Network ConsultantCommented:
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
if you are using Multicast NLB, the method we use for clients is to include the Static MAC address in ALL Cisco switches in the organisation, that includes all uplinks and VLANs across the infrastructure.
0
llaravaAuthor Commented:
jordannet: this is what I've already tried as you can see above in my previous post.

hanccocka: ESX uplinks (interfaces) to the Cisco Switches or Routers in our case we use 6509

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1006525

0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Yes, familiar with the 6509.

We update ALL Cisco switches on the network!

It can be quite tiresome, to go around ALL the switches, and make the changes.
0
llaravaAuthor Commented:
You got to do what you got to do. I haven't seen any article that indicates that I have to update all the switches. Do you have any article that do you want to share with me?

So far VMWare support have not answered my question and Cisco is saying that everything is configured correctly.
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
I'm afraid it's from experience, of completing this many times.

Long ago, ESX 2.5/3.0 days, there was an article, I've not been able to find since, that stated to statically allocate

Multicast Address (Mac Address against IP Address)
Node 1 Address (Mac Address against IP Address)
Node 2 Address (Mac Address against IP Address)

There was also a mention in the artcile to ensure, that the Node Addresses used, were also static MAC addresses for the VMs, and not auto-generated ones. and to enable Mac Address spoofing on the vSwitch.

This is the receipe, we have complete for many years (since 2003) with success, and often it's failed because Network Management, somewhere have not completed what we've asked, and we've had ALL the Switch configs off ALL the Cisco switches, to check, they've been done.

and have you allocated this to ALL your Trunks, and Uplinks everywhere, where you would expect the traffic to be.
0
llaravaAuthor Commented:
Sound like a lot.

The KB article above mentiones that you have to make the following changes for the uplink interfaces on for your ESX servers.

arp 172.20.200.204 03bf.ac14.c8cc ARPA

mac-address-table static 03bf.ac14.c8cc vlan 200 interface GigabitEthernet9/41 GigabitEthernet9/24 GigabitEthernet9/11 GigabitEthernet9/17 GigabitEthernet9/15

The articel samples that following commands below, that being said can you please clarify what needs to be done on the ALL of the switches?

1.
Telnet in to Cisco Switch Console and log in.

2.
Run this command to enter Configuration mode:

config t

3.
STATIC ARP RESOLUTION Cisco Global command mode

For example:

arp [ip] [cluster multicast mac] ARPA
arp 192.168.1.100 03bf.c0a8.0164 ARPA

4.
STATIC MAC RESOLUTION Cisco Global command mode

For example:

mac-address-table static [cluster multicast mac] [vlan id] [interface]
mac-address-table static 03bf.c0a8.0164 vlan 1 interface GigabitEthernet1/1 GigabitEthernet1/2
GigabitEthernet1/15 GigabitEthernet1/16
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
same configuration on ALL the switches

arp 172.20.200.204 03bf.ac14.c8cc ARPA
arp 172.20.200.205 mac address ARPA
arp 172.20.200.206 mac address ARPA


and then, work out, what trunks and uplinks you would also see those Mac Addresses

mac-address-table static [cluster multicast mac] [vlan id] [interface]
mac-address-table static 03bf.c0a8.0164 vlan 1 interface GigabitEthernet1/1 GigabitEthernet1/2
GigabitEthernet1/15 GigabitEthernet1/16
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
it depends on how compex you network is, and what needs to access the cluster, for most of our clients, it's external and internal, as clusters are often used for front facing Intranet and Internet websites, and Backend Content Management.
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
complex your network is.
0
llaravaAuthor Commented:
Our network is not that complex.

I got that we have to configure this on all of our switches.

same configuration on ALL the switches

arp 172.20.200.204 03bf.ac14.c8cc ARPA
arp 172.20.200.205 mac address ARPA
arp 172.20.200.206 mac address ARPA

But I am having difficulties understanding the second part. What exactly do we need to do with the TRUNKs on the switches? what about the uplinks? can you please help me out to understand this part.

and then, work out, what trunks and uplinks you would also see those Mac Addresses

mac-address-table static [cluster multicast mac] [vlan id] [interface]
mac-address-table static 03bf.c0a8.0164 vlan 1 interface GigabitEthernet1/1 GigabitEthernet1/2
GigabitEthernet1/15 GigabitEthernet1/16
 
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
You need to also specify on the next switch that connects to the switch which the ESX servers are connected to, and also specify the Ports, that those Mac Addresses would be seen on.

I do not know how your network is configured, but does 6509 connect to any other switches?

or is everything contained in the 6509?

what often complicates it, is if you've split your servers to run across 2 x Core 6509s?
0
llaravaAuthor Commented:
That is exactly what we are doing we are running 2 x Core 6509s. Our physical servers as well as the ESX servers are connected to both 6509s with HSRP. Our ESX servers are have 2 dedicated interfaces for data and each one is connectd to each one of the 6509's.
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
So all traffic is within the Core 6509s?
0
Maen Abu-TabanjehNetwork Administrator, Network ConsultantCommented:
0
llaravaAuthor Commented:
hanccocka: The other switches are also connected to the 6509s. So basically all the switches with the workstations, etc.. are conneceted to the 6509's. Also the ESX server and physical server are also connected to there.

jordannet: The link just directs me to my own question...





0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
so have you altered the switches the workstations are connected to?

and the port which connects 6509 to workstation switch you logged into workstation switch and configured the port for static mac address?
0
Maen Abu-TabanjehNetwork Administrator, Network ConsultantCommented:
am sorry maybe i missed something , or posted by mistake (previous link) , but i found solution from cisco for your case , also other links that may give you idea.. i wish its can help , am only doing my bests to help you

http://www.cisco.com/en/US/docs/solutions/Enterprise/Data_Center/vmware/cisco_VMwareView.html

http://www.booches.nl/2008/04/port-channel-cisco-vs-vmware-esx/

http://workinghardinit.wordpress.com/2010/07/23/reflections-on-getting-windows-network-load-balancing-to-work-part-2/
0
llaravaAuthor Commented:
hanccocka,

No I haven't. I am working with Cisco support trying to figure this out. We haven't seen anything yet that points us to configure the ARP on all of the switches.

The only place the ARP entry has been altered is the Core 6509's as indicated by the KB VMWare article.  

Also cisco KB http://www.cisco.com/en/US/products/hw/switches/ps708/products_configuration_example09186a0080a07203.shtml  don't seem to say anything about changing the ARP on all the switches.

jordannet: No worries everything is always welcome. Thank you!
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Well thats what we do, and it always works for us.
0
llaravaAuthor Commented:
hanccocka: We have found that one of the 6509s is not able to ping the VIP. The one that can't ping the VIP is the primary core 6509. The second 6509 which is in standby is able to ping the VIP.

We are planning on switching over the standby 6509 and test. We might get the same result where the active HSRP core switch doesn't reach the VIP or it might work which will indicate that there is a problem with te configuration with the current primary core switch configuration.

I will follow up once the change is done and we probably have to go with the change that you have indicated but before I want to have both 6509s reaching the VIP.  
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
glad you are making some progress.
0
llaravaAuthor Commented:
Hi hanccocka,

You've mentioned that we need to add the same configuration on ALL the switches

3 IP and MACs for the VIP and the nodes:

arp 172.20.200.204 03bf.ac14.c8cc ARPA
arp 172.20.200.205 mac address ARPA
arp 172.20.200.206 mac address ARPA


and then, work out, what trunks and uplinks you would also see those Mac Addresses

Do we need to use the mac-address-table static with just the VIP and Virtual MAC or VIP and virtual MAC and the IPs for node1 and node2 and their MACs?

mac-address-table static [cluster multicast mac] [vlan id] [interface]
mac-address-table static 03bf.c0a8.0164 vlan 1 interface GigabitEthernet1/1 GigabitEthernet1/2
GigabitEthernet1/15 GigabitEthernet1/16

Thank you
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
when weve configured switchrs, we used all mac addresses for all nodes, and the cluster mac address.

if you have a network diagram of how your switches are connected, and what ports are connected together it makes it easier.
0
llaravaAuthor Commented:
hanccocka

We have configured the uplink ports (data interfaces) for our ESX servers as follows:

!
interface GigabitEthernet9/24
 description VM4 Data
 switchport
 switchport access vlan 200
 switchport mode trunk
 spanning-tree portfast

We are not doing VLAN taggin just configuring them as trunks. This is been working great for years.

We have narrowed down the problem to the following:

We can get to the VIP from that same and different vlans only if the source server is a VM. Which indicates that vSwitch or PortGroups allow this to happen.

However we can't even get to the VIP from the same VLAN if we are using phyiscal servers which are connected to the same 6509.  

Is the current configuration of the uplink something that it might be causing this behaviour? Any other ideas?  
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
We always configure our Cisco trunks like this

example confiiguration for the switch

interface GigabitEthernet2/8
 description esxdev001
 switchport trunk encapsulation dot1q
 switchport trunk allowed vlan 5,7,8,703,705
 switchport mode trunk
 speed 1000
 duplex full
 spanning-tree portfast trunk

and then we Tag and Specify the Tag Numbers on the Virtual Portgroups on ESX.

So in your configuration all the traffic leaving ESX would be going onto VLAN 200?

and I'm assuming those servers are also attached to Access VLAN 200?

What if you change you configuration as above, present a trunk, and then tag ALL VLANs, and specify the portgroup on ESX.
0
llaravaAuthor Commented:
Everything I have said about source VMs being able to consistently reach the VIP is not true. The system misbehaves sometimes it works from the same vlan sometimes it work from a different VLAN and physical or virtual.

Running a capture from the NLB nodes I have seen that the pings to the VIP are not going to the MAC for the virtual IP instead they are going to the vNIC MAC address that is associated with the VM itself.

It is my undertanding that the ping should move form the 6509-1 to the MAC that belong to the VIP address . Correct?

172.20.200.204 - VIP

   
capture.jpg
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
in an NLB cluster, the IP address and Mac address of the Cluster IP are virtual, and will handle the nodes, and distribute traffic to the converged nodes.

I would test you NLB, check nodes are converged, and then Stop NodeA, test, start NodeA, Stop NodeB etc, and check traffic is reaching the nodes correctly.

is this a website, or something you can easily observe, which Node is answering.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
VMware

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.