Link to home
Start Free TrialLog in
Avatar of PaRamirez
PaRamirezFlag for United States of America

asked on

Windows 2008 R2 NLB w\RDS in a VMWare ESX 4.0 Environment

We have been weighing the pros and cons of NLB in a ESX 4.0 enviroment.

We have had concerns with using the F5's with TS session broker as it had issues in the past and our F5's do not route. We see MS recommends UNICAST and VMWare Recommends MULTICAST. Both mean doing strange configs either to ESX or to ARP tables in networking somewhere. Unless you are lucky with Cisco, etc. We have tried NLB alone and with session broker in ESX and have had issues unicast and multicast. Were testing again with F5's, but are wondering if any have been through this? We are using both 2008 R1 and R2 for the same thing, NLB TS or RDS in and ESX 4.0 enviroment. [2008 R1\R2]

What works best? Windows NLB alone, using it with TS SB, or TS Session Broker with and F5? Unicast like MS likes and disable RARP globally in ESX and make exceptions were needed all over or mess with vmotion\nic teaminc etc, or Multicast were you may or may not need to create static ARP tables out in networking land, but make no changes to ESX?

NLB, Session Broker and F5's Oh My! NLB, Session Broker and F5's Oh My!

Anyway so many ways to go, and it seems like MS and VMWare are at odd, and that VMWare did not make it easy for NLB to work, seem like they need a form of MAC address spoofing rather then present the options they do, either reconfig networking or make global changes to ESX, yeah real nice choices... BUT WHICH and WHY??? Pro's and Con's
ASKER CERTIFIED SOLUTION
Avatar of Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of PaRamirez

ASKER

If you use Unicast did you have to config your ESX box to not use RARP?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
So you had to disable RARP globally and enable it on any specific port groups so as to not mess with certain functions such as vmotion or teaming in ESX 4.0.

Do you have the kind of swithches that can associate the IP and the MAC in the way not normally allowed by most? All NLB VM's on same host?

So you also had to update switches and config ESX to be able to use either Multicast or Unicast? So now you can use either? Are you just using NLB or an F5, also have you done this with Terminal Services or RDS along with NLB or with an F5. Also if so did you use TS Session Broker?

If TS Session broker is uses was it with NLB or F5, did you go unicast or multicast or did you get both to work?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hmm? This is a bear! Each section below is its own set of questions about your last responce. I have 3 main q's in 6 parts...

1. For TS you dont use NLB, due to Citrix and 2x Load Balancers, so what are they, can they be used for TS also?

2. Also someone told me for multicast there are some other need to do's, do you agree with these required steps for a multicast implimentation?

2a. "Disable DDNS/WINS. Network Load Balancing does not support dynamic Domain Name System (DNS) resolution, where the name of the cluster is automatically registered by the host when the host starts. This functionality must be disabled on the Network Load Balancing interface for both DNS and Windows Internet Name Service (WINS); otherwise, each host’s computer name will be registered with the cluster IP address. When using Network Load Balancing with DNS, you will need to directly configure the DNS server to register the name."

2b. How do you disable this functionality? uncheck stuff under the DNS and WINS tabs?

2c. What does it mean, need to directly configure the DNS server to register the name??? How do I do this? What do I do? Associate the Cluster IP to what in DNS? or not? or associate the computer name with its real IP mannually is that what is meant here? Need to be manual why?


3. When you say no Config on ESX other then host VM, what do you mean change MAC addresses to Static Addresses, Forged MAC address transmits. I use ESX all the time and dont see this when I choose edit settings. Is this a must?

1. No. We use software Load Balancers built into the 2X and Citrix products.

2. Register the Cluster Name in DNS - that's all you need to do. cluster.company.com - 10.10.0.1.

3. MAC Address changes are required for the VM, so you change the VM Network Card settings, either in the VMX or from the menus. Change the Virtual Switch Properties and you will see these options. Yes it is a must.
Wow,

I have never seen or heard of step 3 entailments... Do you have a link to this tid bit or an indepth how to about this?

Never seen the notion MAC address changes are required for the VM within ESX. I know that NLB makes changes to the VM MAC address, by adding that layer 2 mulitcast address. But was unaware you had to make explicit changes to the VM in ESX for this...

Can you do this from Edit settings in ESX? When you say in the VMX is that a config in\on a file? How do I change the Virtual switch properties and to see these options? Sounds like your saying you can do this from more then one place? But how exactly? From edit settings, from a config of the VMX or from the virtual switch properties?

P.S. I would really really appreciate a link that speaks to this idea if you have them... Clean clear how to and explination... Its just I have never ever heard of this step...

P.S.S What did you think of my other possibly required steps for DDNS and WINS??? Do you agree with this need?
2a. "Disable DDNS/WINS. Network Load Balancing does not support dynamic Domain Name System (DNS) resolution, where the name of the cluster is automatically registered by the host when the host starts. This functionality must be disabled on the Network Load Balancing interface for both DNS and Windows Internet Name Service (WINS); otherwise, each host’s computer name will be registered with the cluster IP address. When using Network Load Balancing with DNS, you will need to directly configure the DNS server to register the name."

2b. How do you disable this functionality? uncheck stuff under the DNS and WINS tabs?

Un-check Register with DNS on the network card.
Wow,  [You may have missed this one as I wrote two replies]

I have never seen or heard of step 3 entailments... Do you have a link to this tid bit or an indepth how to about this?

Never seen the notion MAC address changes are required for the VM within ESX. I know that NLB makes changes to the VM MAC address, by adding that layer 2 mulitcast address. But was unaware you had to make explicit changes to the VM in ESX for this...

Can you do this from Edit settings in ESX? When you say in the VMX is that a config in\on a file? How do I change the Virtual switch properties and to see these options? Sounds like your saying you can do this from more then one place? But how exactly? From edit settings, from a config of the VMX or from the virtual switch properties?

P.S. I would really really appreciate a link that speaks to this idea if you have them... Clean clear how to and explination... Its just I have never ever heard of this step...

P.S.S What did you think of my other possibly required steps for DDNS and WINS??? Do you agree with this need?
1. Edit the network card settings for the VM, select Manual MAC address, rather than automatic, and enter your MAC address. (or old fasioned way, edit the *.VMX file!).

2. If you want to stop it registering with DDNS/WINs, yes unregistered. The Cluster is always referred to by it's DNS name, so it doesn't matter if it registers with DNS/WINS. You are only every going to refer to it by its DNS name.

This is rather odd, because I cannot find any documentation, that states you need to change the MAC addresses, but we've been doing it for years! Based on documentation from VMware, and changing vSwitches to Forge Transmits, but current docs, state no changes required on ESX for Muliticast!



So if you Statically Allocate MAC addresses on Virtual Machines you need to follow this guide to give the MAC addresses the correct MAC format

http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&externalId=507&sliceId=1&docTypeID=DT_KB_1_1&dialogID=204668653&stateId=0%200%20197712633

and if you change a MAC address, you also need to allow Forged Transmits. (because you' ve created a static mac address, and not auto generated).

So it depends on your infrastructure, and what works for you, this is how we've been doing it for clients for many years, and it works for them.
When I go to edit setting by right clicking the VM I see no option in ESX 4.0 to select Manual MAC address rather then automatic? Also when you say enter your MAC address where did you generate it? This setting you are talking about remind me of somehting I say for Hyper-V, where they solve this issue by allowing MAC spoofing...

But where do I actually do the config? Not via edit setting after rt clicking the VM...

P.S. That link you gave is blocked for me by websense can you email me the body of text...
paramirez@firstam.com
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Does it explain in that article I am blocked from why "There's no real configuration required on ESX, other than host VMs, change MAC addresses to Static Addresses, Forged Mac Address transmits."

I see that it says you need to HARD CODE MACs and ARP, but it says in your Cisco switching infrastructure, but you said create MACs in VMware and that I have to manually edit the VMX file. Do we also have to edit the Cisco switches in any other way aside from the static arp associating the Cluster IP to the Multicast MAC, I assume this is the MAC I generated and hard coded in VMWare via the VMX file??? Is that correct?

P.S. You have been a great help BTW...

P.S.S. vSphere Client is version 4.1.0 build 345043 and VMWare vCenter Server is version 4.0.0 Build 208111
We need to set Forge Mac Transmits, because we are creating are own "spoofed MAC address".

All you need to do is make sure the Cisco switches have the Static ARPs.

Yes, those are the MAC addresses, and the Cluster MAC address.

Yes, You vCenter is old. Current is 4.1

I was thinking would you want to spoof the MAC to something specific rather then just make one up? Whats the logic for spoofing anyway? Isnt it so that if an RARP is sent to the VM it will return the multicast MAC of the cluster. So dont you want to spoof each VM MAC to be the MAC of the cluster? If not why not? Im just trying to really understand this better... Im also trying to see why this is a best practice then through understanding its major purpose and specific need and effect for having done this or not done this. What happens if you forget this step like in most of the how to setups I see for this ESX\NLB ball of wax...

Also after reading what you posted from that article I did not see where it said make MAC configs in ESX, it appears to imply to me the hard coded MAC is perhaps just part of the ARP entry for the Cluster multicast MAC on the cisco switches.

From this: " In order to use multicast NLB, you need to (*gulp*) hard-code MACs and ARP entries in your Cisco switching infrastructure." Not ESX right? Where do you find the part about doing MAC configs in ESX and what that buys you and what will happen if you didnt do this step?
Because it's non-standard, auto-generated by VMware.

Static ARPs on your Cisco equipment.

If you don't assign static Macs to all your equipment, Cluster will not converge or load balance.
There was a document when first configuring Multicast NLB on Cisco equipment, that stated use Static MAC addresses, not autogenerated, and statically assign all these to Cisco equipment, as well as the Cluster Address.

This is what we have always done, and it works for us. Please feel free to try otherwise.
All I can suggest is Lab and try it, maybe we are doing extra steps that are not required anymore, we've been doing NLB, since it a was called Microsoft WLB, (on ESX 1.0!), so a few years.

The next client we configure NLB for, we may test .
Was it a Cisco Recommendation from a doc from Cisco that made the recomendation? Or someone else? VMWare or Microsoft pehaps?
Someone said this would you agree? "As far as I know, WNLB requires a heartbeat NIC.  Do not put a gateway or DNS for the heartbeat NIC" Sounds like what I was saying before, reason I ask is we did this and it did not fail as far as we know, what are the potential gotcha if we leave it like that?
I don't know what they are referring to with a heartbeat nic for NLB.

This sounds like HA Clustering for VMware.
Is it ok in your opinion to put a gateway or DNS\WINS on an NLB NIC then?
well you must have a gateway on the Clustered NLB for it to route!

and you must have a DNS entry for it, otherwise clients will not be able to find the Cluster!
Hmm remember that tid bit from above?  Question 2a, does this not speak to this? On one NLB cluster I have 4VMs in it, each VM has 2 NICS, but best practice in general is not to have gateways on more then one NIC. But which NIC should not have it? Which NIC should 2a be applied to?

Whats correct?

The NLB NIC>> Gateway and DNS\WINS? or no

The Prod NIC>> Gateway and DNS\WINS? or no or also or what combo

Which one gets what? From what you said earlier I would guess the NLB NIC gets the gateway only but as of now both our Prod and NLB NICS have a gateway in violation general server best practice unless you need that for redundency on the same subnet... [but having two gateways is usually not done in general, when having multiple NICs usually only one has a gateway defined]
they can all have the gateway present.

Only clients comminicating with the Cluster will communicate via the Cluster IP address of Cluster DNS name, but if you need to do other management like RDP to a Server, if its' real NIC does not have a gateway you will not be able to manage from another subnet.

Because otherwise you'll not know what Server you are RDP-ing to!
So is this a best practice to have two gateways when using NLB? It sounds like a good idea, but I have not seen it recommended. Does not mean that its not though...

Do you do this all the time by default? Multiple Default Gateways for NLB?
Yes, all the time, otherwise you cannot manage or get access to the servers!
I like you, you deserve more points...

   You helped so much I dont know which answer to give credit to.  Anyway I saw some other discussion from a place that also refered to the link you gave I could not get to. They say Cisco does not recommend NLB as it does not conform to some RFC, ever hear of this?

From this: http://social.technet.microsoft.com/Forums/en-US/exchange2010/thread/763b9dfa-4f52-4445-9512-7a364aab0061/

"It was recommended to not use Microsoft Load Balancing as there would be;

1)      Unnecessary traffic congestion over the whole server vlan for multicast traffic

2)      Unknown CPU utilization

3)      Cisco did not recommend using it as it doesn’t comply to network RFCs

 

I think we all agreed that the long term goal will be to have a dedicated load balancing appliance."

We have ended up using Cisco IOS SLB.

Also for reference :-

http://bitpushr.wordpress.com/2009/09/08/vmware-esx-microsoft-nlb-and-why-you-should-care/


--------------------------------------------------------------------------------

Cisco wants to sell you a hardware load balancer!
Never had an issue, and when we've given the costs and benefits, advantages and disadvantages to our clients, they would rather spend money on VMware Licensing, Software, SAN, Networking, NICs, iSCSI, or additional host servers than purchase a hardware device.
Can I just mention haproxy...
oh and with keepalived of course
What is haproxy? And how do keepalives relate?
Pertaining to this problem: Saw this...

Microsoft is the party at fault here.
Using the same MAC address on every server (even as a secondary MAC address) completely breaks the fundamental theory behind ethernet switches.

Here is the part I wonder if anyone agrees with:

"Additionally, NLB incurs a massive CPU penalty on each server since every server MUST process every packet, even if they are not the final handler. Since Microsoft networking stack isn’t very efficient, this could waste up to 25% of your CPU in networking processing instead of application processing. You can be throwing away many tens of thousands"
haproxy is a Linux loadbalancer keepalived is akin to a VRRP for Linux. The two in combination provide a HA loadbalancer, as used by reddit, stack overflow and many other high volume sites.

In addition to it's HTTP capabilities (including cookie manipulation) it also has several TCP capabilities. I use it to loadbalance Exchange 2007 and 2010 CAS servers...
Hacccocka was a great help and always got back.