asked on

Windows 2008 NLB in a VMWare environment - cant access NLB node from outside its subnet

Hello all-
I've implemented an Exchange 2010 CAS array on VMWare and performing load balancing using Windows NLB. I understand this is not optimal, but unfortunately it's what I must work with. I am in the middle of putting together an Exchange 2007 to 2010 transition.

The NLB appears to function normally and is configured for multicast mode per VMWare's white papers. Static ARP entries are in place on all switches (or so I'm told) and I can ping the NLB VIP but cannot browse resources (ex. \\casarray\c$) if I'm on a different subnet than the CAS Array members. Essentially I get an "Error Code: 0x80070035 network path was not found." Appropriate ports are open on the NLB for MAPI and http/https traffic.

I did see a VMWare article that discusses Unicast and how the guest will need to be reconfigured:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1556

Right now, I'm stuck. I can't proceed further in testing as my Outlook clients cannot attach to my CAS array.

So my questions are:
1. It looks as though VMWare does support Unicast. By going this route and following the article's instructions, could I bypass other potential network issues and keep things moving forward?
2. If I have to stick with multicast and my static ARP entries are in place, where else should I look?

Thanks in advance for you assistance.
Eric

Cymbaline65

ASKER

Additional comment: I'm told by one source that if all static ARP entries at all switches from the core out are not in place, ICMP may work but I still may not hit the NLB. Thoughts?

SOLUTION

Bruno PACI

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Cymbaline65

ASKER

Thanks, PaciB.
I will perform this test when I'm on site today and report back the results.
Just to be clear, just because I can ping an interface does not mean that all my ARP entries are in place?
I'm willing to bet that I won't be able to hit either system via the VIP.
Of course, I can hit either system via their dedicated IP address without issue..

Bruno PACI

ICMP protocol (PING) is different of TCP and UDP... I can't be sure but it might be processed differently by NLB.
Also, depending of the way you configured your NLB cluster you may have excluded protocols other that HTTPS (as NLB is only for CAS access some step by step articles guide you to a NLB configuration that only handle HTTPS procotol, all other protocols are not load balanced) ?? Are you sure that your NLB cluster is configured to handle all protocols ?

Cymbaline65

ASKER

Yes, NLB ports for HTTP, HTTPS and mapi are opened. Remember, I can access resources if I'm on the same subnet as the nlb hosts. It looks like it's still pointing to arp.

Bruno PACI

Oh yes you're right, I forgot this point.

So the problem probably comes from the router that refuses ARP response from the NLB. I had this problem several times before with my customers.

The first time I encountered this issue I had to make a traffic capture to understand there was something wrong with ARP requests, because I could see that obviously the router was always trying to send ARP requests for the NLB IP address even if the NLB has given an answer, leading me to the solution.

If making a traffic capture is not a big deal for you you can use Microsoft Network Monitor to confirm this before any other operation. Install the tool on both NLB members, make a capture filter to just keep ARP protocol and start capture.
From a computer behind the router you launch a PING command with the "-t" switch to make a permanent ping on the NLB VIP.

If things are normal with ARP you should see an ARP request for the NLB VIP coming from the router and an ARP response coming from your NBL members every 30 seconds. This is the usual ARP cache expiration time.
If things are wrong with ARP you 'll see an ARP request for the NLB VIP and an ARP response at least every second ! This means that the router does not take care of the ARP response and that you may have to statically fill its ARP Cache.

Usually, filling the ARP cache with static ARP information is possible. But I had to tell that one time in a customer network I encountered a network router that was not able to take static ARP informations. I had to change the network architecture for CAS servers and change to unicast NLB but I needed to isolate these servers in a VLAN to avoid flooding all the network.

ASKER CERTIFIED SOLUTION

Cymbaline65

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

ArneLovius

As you are in a VM environment, I would have suggested using HAProxy on Linux, and unison and keepalivd to make it HA. You would have the extra processor overhead, but remving NLB is always a good thing...

Cymbaline65

ASKER

In many of the articles I researched, IP forwarding was not mentioned. PaciB spend to good deal of effort with his responses so I'll give him the points