Windows 2008 NLB in a VMWare environment - cant access NLB node from outside its subnet

Posted on 2013-01-27
Last Modified: 2013-02-02
Hello all-
I've implemented an Exchange 2010 CAS array on VMWare and performing load balancing using Windows NLB. I understand this is not optimal, but unfortunately it's what I must work with. I am in the middle of putting together an Exchange 2007 to 2010 transition.

The NLB appears to function normally and is configured for multicast mode per VMWare's white papers. Static ARP entries are in place on all switches (or so I'm told) and I can ping the NLB VIP but cannot browse resources (ex. \\casarray\c$) if I'm on a different subnet than the CAS Array members. Essentially I get an "Error Code: 0x80070035 network path was not found." Appropriate ports are open on the NLB for MAPI and http/https traffic.

I did see a VMWare article that discusses Unicast and how the guest will need to be reconfigured:

Right now, I'm stuck. I can't proceed further in testing as my Outlook clients cannot attach to my CAS array.

So my questions are:
1. It looks as though VMWare does support Unicast. By going this route and following the article's instructions, could I bypass other potential network issues and keep things moving forward?
2. If I have to stick with multicast and my static ARP entries are in place, where else should I look?

Thanks in advance for you assistance.
Question by:Cymbaline65
  • 5
  • 3

Author Comment

ID: 38824761
Additional comment: I'm told by one source that if all static ARP entries at all switches from the core out are not in place, ICMP may work but I still may not hit the NLB. Thoughts?
LVL 16

Assisted Solution

PaciB earned 500 total points
ID: 38825959

First of all, to verify that your issue is related to missing ARP tables or to VMware "conflict" with NLB multicast you should (if possible) try this:

1) Shutdown one of the NLB member so only one member is still alive. Shut it down completely, a service stop is not enough.
2) Do all network test (PING, access to shares, etc).
3) Restart the member, wait for services to be started and shutdown the other member.
4) Redo all network tests (PING, access to shares,...)

If you still have trouble to access resources while only one member is alive, it looks like you have a router that need a static ARP definition to support your Multicast NLB, or maybe a router that does not support multicast NLB.
If the router does not support that a non-multicast IP address to be associated with a multicast MAC address you have a real problem because there is no solution with multicast.
Many routers support that but some of them need a Static ARP entry to be declared in their ARP cache because they are not able to discover the MAC address of the multicast NLB (in fact, they ignore the ARP response of the NLB because the MAC address in the response is a multicast one but the IP address in the response is not a multicast IP, so they don't accept the ARP answer).

NOTE : Don't confuse Static ARP entry that has to be created on the router (a static ARP entry for the ARP cache) with the static ARP table that has to be created on switches. The ARP tables in the switches are needed to explain to the switches that having the same MAC Address reachable simultaneoulsy through 2 Ethernet ports is normal and that network packets must be sent on both ports.
The static ARP entry on the router is to fill the ARP cache with a static entry that associate the IP address of the NLB with the MAC address of the NLB.

If your network issues disappear when only one NLB member is alive then you have an issue with VMware.

Have a good day.

Author Comment

ID: 38826247
Thanks, PaciB.
I will perform this test when I'm on site today and report back the results.
Just to be clear, just because I can ping an interface does not mean that all my ARP entries are in place?
I'm willing to bet that I won't be able to hit either system via the VIP.
Of course, I can hit either system via their dedicated IP address without issue..
VMware Disaster Recovery and Data Protection

In this expert guide, you’ll learn about the components of a Modern Data Center. You will use cases for the value-added capabilities of Veeam®, including combining backup and replication for VMware disaster recovery and using replication for data center migration.

LVL 16

Expert Comment

ID: 38826330
ICMP protocol (PING) is different of TCP and UDP... I can't be sure but it might be processed differently by NLB.
Also, depending of the way you configured your NLB cluster you may have excluded protocols other that HTTPS (as NLB is only for CAS access some step by step articles guide you to a NLB configuration that only handle HTTPS procotol, all other protocols are not load balanced) ?? Are you sure that your NLB cluster is configured to handle all protocols ?

Author Comment

ID: 38826543
Yes, NLB ports for HTTP, HTTPS and mapi are opened. Remember, I can access resources if I'm on the same subnet as the nlb hosts. It looks like it's still pointing to arp.
LVL 16

Expert Comment

ID: 38826576
Oh yes you're right, I forgot this point.

So the problem probably comes from the router that refuses ARP response from the NLB. I had this problem several times before with my customers.

The first time I encountered this issue I had to make a traffic capture to understand there was something wrong with ARP requests, because I could see that obviously the router was always trying to send ARP requests for the NLB IP address even if the NLB has given an answer, leading me to the solution.

If making a traffic capture is not a big deal for you you can use Microsoft Network Monitor to confirm this before any other operation. Install the tool on both NLB members, make a capture filter to just keep ARP protocol and start capture.
From a computer behind the router you launch a PING command with the "-t" switch to make a permanent ping on the NLB VIP.

If things are normal with ARP you should see an ARP request for the NLB VIP coming from the router and an ARP response coming from your NBL members every 30 seconds. This is the usual ARP cache expiration time.
If things are wrong with ARP you 'll see an ARP request for the NLB VIP and an ARP response at least every second ! This means that the router does not take care of the ARP response and that you may have to statically fill its ARP Cache.

Usually, filling the ARP cache with static ARP information is possible. But I had to tell that one time in a customer network I encountered a network router that was not able to take static ARP informations. I had to change the network architecture for CAS servers and change to unicast NLB but I needed to isolate these servers in a VLAN to avoid flooding all the network.

Accepted Solution

Cymbaline65 earned 0 total points
ID: 38827687
Looks like I found the problem. I read this article:
and enabled IP forwarding on my NLB interface and voila! I can now hit the server resources. I need to do some more testing to be sure.
LVL 37

Expert Comment

ID: 38828290
As you are in a VM environment, I would have suggested using HAProxy on Linux, and unison and keepalivd to make it HA. You would have the extra processor overhead, but remving NLB is always a good thing...

Author Closing Comment

ID: 38846400
In many of the articles I researched, IP forwarding was not mentioned. PaciB spend to good deal of effort with his responses so I'll give him the points

Featured Post

Are your AD admin tools letting you down?

Managing Active Directory can get complicated.  Often, the native tools for managing AD are just not up to the task.  The largest Active Directory installations in the world have relied on one tool to manage their day-to-day administration tasks: Hyena. Start your trial today.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

INTRODUCTION The purpose of this document is to demonstrate the Installation and configuration of the Data Protection Manager product. Note that this demonstration was prepared on the basis of Windows OS is 2008 R2 and DPM 2010. DATA PROTECTI…
Possible fixes for Windows 7 and Windows Server 2008 updating problem. Solutions mentioned are from Microsoft themselves. I started a case with them from our Microsoft Silver Partner option to open a case and get direct support from Microsoft. If s…
This tutorial will walk an individual through the steps necessary to join and promote the first Windows Server 2012 domain controller into an Active Directory environment running on Windows Server 2008. Determine the location of the FSMO roles by lo…
This Micro Tutorial hows how you can integrate  Mac OSX to a Windows Active Directory Domain. Apple has made it easy to allow users to bind their macs to a windows domain with relative ease. The following video show how to bind OSX Mavericks to …

777 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question