Improve company productivity with a Business Account.Sign Up

  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 3247
  • Last Modified:

Windows 2008 NLB in a VMWare environment - cant access NLB node from outside its subnet

Hello all-
I've implemented an Exchange 2010 CAS array on VMWare and performing load balancing using Windows NLB. I understand this is not optimal, but unfortunately it's what I must work with. I am in the middle of putting together an Exchange 2007 to 2010 transition.

The NLB appears to function normally and is configured for multicast mode per VMWare's white papers. Static ARP entries are in place on all switches (or so I'm told) and I can ping the NLB VIP but cannot browse resources (ex. \\casarray\c$) if I'm on a different subnet than the CAS Array members. Essentially I get an "Error Code: 0x80070035 network path was not found." Appropriate ports are open on the NLB for MAPI and http/https traffic.

I did see a VMWare article that discusses Unicast and how the guest will need to be reconfigured:

Right now, I'm stuck. I can't proceed further in testing as my Outlook clients cannot attach to my CAS array.

So my questions are:
1. It looks as though VMWare does support Unicast. By going this route and following the article's instructions, could I bypass other potential network issues and keep things moving forward?
2. If I have to stick with multicast and my static ARP entries are in place, where else should I look?

Thanks in advance for you assistance.
  • 5
  • 3
2 Solutions
Cymbaline65Author Commented:
Additional comment: I'm told by one source that if all static ARP entries at all switches from the core out are not in place, ICMP may work but I still may not hit the NLB. Thoughts?
Bruno PACIIT ConsultantCommented:

First of all, to verify that your issue is related to missing ARP tables or to VMware "conflict" with NLB multicast you should (if possible) try this:

1) Shutdown one of the NLB member so only one member is still alive. Shut it down completely, a service stop is not enough.
2) Do all network test (PING, access to shares, etc).
3) Restart the member, wait for services to be started and shutdown the other member.
4) Redo all network tests (PING, access to shares,...)

If you still have trouble to access resources while only one member is alive, it looks like you have a router that need a static ARP definition to support your Multicast NLB, or maybe a router that does not support multicast NLB.
If the router does not support that a non-multicast IP address to be associated with a multicast MAC address you have a real problem because there is no solution with multicast.
Many routers support that but some of them need a Static ARP entry to be declared in their ARP cache because they are not able to discover the MAC address of the multicast NLB (in fact, they ignore the ARP response of the NLB because the MAC address in the response is a multicast one but the IP address in the response is not a multicast IP, so they don't accept the ARP answer).

NOTE : Don't confuse Static ARP entry that has to be created on the router (a static ARP entry for the ARP cache) with the static ARP table that has to be created on switches. The ARP tables in the switches are needed to explain to the switches that having the same MAC Address reachable simultaneoulsy through 2 Ethernet ports is normal and that network packets must be sent on both ports.
The static ARP entry on the router is to fill the ARP cache with a static entry that associate the IP address of the NLB with the MAC address of the NLB.

If your network issues disappear when only one NLB member is alive then you have an issue with VMware.

Have a good day.
Cymbaline65Author Commented:
Thanks, PaciB.
I will perform this test when I'm on site today and report back the results.
Just to be clear, just because I can ping an interface does not mean that all my ARP entries are in place?
I'm willing to bet that I won't be able to hit either system via the VIP.
Of course, I can hit either system via their dedicated IP address without issue..
Making Bulk Changes to Active Directory

Watch this video to see how easy it is to make mass changes to Active Directory from an external text file without using complicated scripts.

Bruno PACIIT ConsultantCommented:
ICMP protocol (PING) is different of TCP and UDP... I can't be sure but it might be processed differently by NLB.
Also, depending of the way you configured your NLB cluster you may have excluded protocols other that HTTPS (as NLB is only for CAS access some step by step articles guide you to a NLB configuration that only handle HTTPS procotol, all other protocols are not load balanced) ?? Are you sure that your NLB cluster is configured to handle all protocols ?
Cymbaline65Author Commented:
Yes, NLB ports for HTTP, HTTPS and mapi are opened. Remember, I can access resources if I'm on the same subnet as the nlb hosts. It looks like it's still pointing to arp.
Bruno PACIIT ConsultantCommented:
Oh yes you're right, I forgot this point.

So the problem probably comes from the router that refuses ARP response from the NLB. I had this problem several times before with my customers.

The first time I encountered this issue I had to make a traffic capture to understand there was something wrong with ARP requests, because I could see that obviously the router was always trying to send ARP requests for the NLB IP address even if the NLB has given an answer, leading me to the solution.

If making a traffic capture is not a big deal for you you can use Microsoft Network Monitor to confirm this before any other operation. Install the tool on both NLB members, make a capture filter to just keep ARP protocol and start capture.
From a computer behind the router you launch a PING command with the "-t" switch to make a permanent ping on the NLB VIP.

If things are normal with ARP you should see an ARP request for the NLB VIP coming from the router and an ARP response coming from your NBL members every 30 seconds. This is the usual ARP cache expiration time.
If things are wrong with ARP you 'll see an ARP request for the NLB VIP and an ARP response at least every second ! This means that the router does not take care of the ARP response and that you may have to statically fill its ARP Cache.

Usually, filling the ARP cache with static ARP information is possible. But I had to tell that one time in a customer network I encountered a network router that was not able to take static ARP informations. I had to change the network architecture for CAS servers and change to unicast NLB but I needed to isolate these servers in a VLAN to avoid flooding all the network.
Cymbaline65Author Commented:
Looks like I found the problem. I read this article:
and enabled IP forwarding on my NLB interface and voila! I can now hit the server resources. I need to do some more testing to be sure.
As you are in a VM environment, I would have suggested using HAProxy on Linux, and unison and keepalivd to make it HA. You would have the extra processor overhead, but remving NLB is always a good thing...
Cymbaline65Author Commented:
In many of the articles I researched, IP forwarding was not mentioned. PaciB spend to good deal of effort with his responses so I'll give him the points
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Has Powershell sent you back into the Stone Age?

If managing Active Directory using Windows Powershell® is making you feel like you stepped back in time, you are not alone.  For nearly 20 years, AD admins around the world have used one tool for day-to-day AD management: Hyena. Discover why.

  • 5
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now