Solved

Windows 2008 NLB in a VMWare environment - cant access NLB node from outside its subnet

Posted on 2013-01-27
9
2,826 Views
Last Modified: 2013-02-02
Hello all-
I've implemented an Exchange 2010 CAS array on VMWare and performing load balancing using Windows NLB. I understand this is not optimal, but unfortunately it's what I must work with. I am in the middle of putting together an Exchange 2007 to 2010 transition.

The NLB appears to function normally and is configured for multicast mode per VMWare's white papers. Static ARP entries are in place on all switches (or so I'm told) and I can ping the NLB VIP but cannot browse resources (ex. \\casarray\c$) if I'm on a different subnet than the CAS Array members. Essentially I get an "Error Code: 0x80070035 network path was not found." Appropriate ports are open on the NLB for MAPI and http/https traffic.

I did see a VMWare article that discusses Unicast and how the guest will need to be reconfigured:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1556

Right now, I'm stuck. I can't proceed further in testing as my Outlook clients cannot attach to my CAS array.

So my questions are:
1. It looks as though VMWare does support Unicast. By going this route and following the article's instructions, could I bypass other potential network issues and keep things moving forward?
2. If I have to stick with multicast and my static ARP entries are in place, where else should I look?

Thanks in advance for you assistance.
Eric
0
Comment
Question by:Cymbaline65
  • 5
  • 3
9 Comments
 

Author Comment

by:Cymbaline65
ID: 38824761
Additional comment: I'm told by one source that if all static ARP entries at all switches from the core out are not in place, ICMP may work but I still may not hit the NLB. Thoughts?
0
 
LVL 16

Assisted Solution

by:PaciB
PaciB earned 500 total points
ID: 38825959
Hi,

First of all, to verify that your issue is related to missing ARP tables or to VMware "conflict" with NLB multicast you should (if possible) try this:

1) Shutdown one of the NLB member so only one member is still alive. Shut it down completely, a service stop is not enough.
2) Do all network test (PING, access to shares, etc).
3) Restart the member, wait for services to be started and shutdown the other member.
4) Redo all network tests (PING, access to shares,...)

If you still have trouble to access resources while only one member is alive, it looks like you have a router that need a static ARP definition to support your Multicast NLB, or maybe a router that does not support multicast NLB.
If the router does not support that a non-multicast IP address to be associated with a multicast MAC address you have a real problem because there is no solution with multicast.
Many routers support that but some of them need a Static ARP entry to be declared in their ARP cache because they are not able to discover the MAC address of the multicast NLB (in fact, they ignore the ARP response of the NLB because the MAC address in the response is a multicast one but the IP address in the response is not a multicast IP, so they don't accept the ARP answer).

NOTE : Don't confuse Static ARP entry that has to be created on the router (a static ARP entry for the ARP cache) with the static ARP table that has to be created on switches. The ARP tables in the switches are needed to explain to the switches that having the same MAC Address reachable simultaneoulsy through 2 Ethernet ports is normal and that network packets must be sent on both ports.
The static ARP entry on the router is to fill the ARP cache with a static entry that associate the IP address of the NLB with the MAC address of the NLB.


If your network issues disappear when only one NLB member is alive then you have an issue with VMware.


Have a good day.
0
 

Author Comment

by:Cymbaline65
ID: 38826247
Thanks, PaciB.
I will perform this test when I'm on site today and report back the results.
Just to be clear, just because I can ping an interface does not mean that all my ARP entries are in place?
I'm willing to bet that I won't be able to hit either system via the VIP.
Of course, I can hit either system via their dedicated IP address without issue..
0
 
LVL 16

Expert Comment

by:PaciB
ID: 38826330
ICMP protocol (PING) is different of TCP and UDP... I can't be sure but it might be processed differently by NLB.
Also, depending of the way you configured your NLB cluster you may have excluded protocols other that HTTPS (as NLB is only for CAS access some step by step articles guide you to a NLB configuration that only handle HTTPS procotol, all other protocols are not load balanced) ?? Are you sure that your NLB cluster is configured to handle all protocols ?
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 

Author Comment

by:Cymbaline65
ID: 38826543
Yes, NLB ports for HTTP, HTTPS and mapi are opened. Remember, I can access resources if I'm on the same subnet as the nlb hosts. It looks like it's still pointing to arp.
0
 
LVL 16

Expert Comment

by:PaciB
ID: 38826576
Oh yes you're right, I forgot this point.

So the problem probably comes from the router that refuses ARP response from the NLB. I had this problem several times before with my customers.

The first time I encountered this issue I had to make a traffic capture to understand there was something wrong with ARP requests, because I could see that obviously the router was always trying to send ARP requests for the NLB IP address even if the NLB has given an answer, leading me to the solution.

If making a traffic capture is not a big deal for you you can use Microsoft Network Monitor to confirm this before any other operation. Install the tool on both NLB members, make a capture filter to just keep ARP protocol and start capture.
From a computer behind the router you launch a PING command with the "-t" switch to make a permanent ping on the NLB VIP.

If things are normal with ARP you should see an ARP request for the NLB VIP coming from the router and an ARP response coming from your NBL members every 30 seconds. This is the usual ARP cache expiration time.
If things are wrong with ARP you 'll see an ARP request for the NLB VIP and an ARP response at least every second ! This means that the router does not take care of the ARP response and that you may have to statically fill its ARP Cache.

Usually, filling the ARP cache with static ARP information is possible. But I had to tell that one time in a customer network I encountered a network router that was not able to take static ARP informations. I had to change the network architecture for CAS servers and change to unicast NLB but I needed to isolate these servers in a VLAN to avoid flooding all the network.
0
 

Accepted Solution

by:
Cymbaline65 earned 0 total points
ID: 38827687
Looks like I found the problem. I read this article:
http://blogs.technet.com/b/networking/archive/2008/11/20/balancing-act-dual-nic-configuration-with-windows-server-2008-nlb-clusters.aspx
and enabled IP forwarding on my NLB interface and voila! I can now hit the server resources. I need to do some more testing to be sure.
0
 
LVL 36

Expert Comment

by:ArneLovius
ID: 38828290
As you are in a VM environment, I would have suggested using HAProxy on Linux, and unison and keepalivd to make it HA. You would have the extra processor overhead, but remving NLB is always a good thing...
0
 

Author Closing Comment

by:Cymbaline65
ID: 38846400
In many of the articles I researched, IP forwarding was not mentioned. PaciB spend to good deal of effort with his responses so I'll give him the points
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Scenario:  You do full backups to a internal hard drive in either product (SBS or Server 2008).  All goes well for a very long time.  One day, backups begin to fail with a message that the disk is full.  Your disk contains many, many more backups th…
OfficeMate Freezes on login or does not load after login credentials are input.
This video Micro Tutorial explains how to clone a hard drive using a commercial software product for Windows systems called Casper from Future Systems Solutions (FSS). Cloning makes an exact, complete copy of one hard disk drive (HDD) onto another d…
This tutorial will walk an individual through the steps necessary to install and configure the Windows Server Backup Utility. Directly connect an external storage device such as a USB drive, or CD\DVD burner: If the device is a USB drive, ensure i…

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now