Loosing connectivity to NLB cluster on virtual servers when a server reboots

Posted on 2007-10-16
Last Modified: 2008-01-09
We have three physical servers we use to host virtual servers. The physical servers run Windows Server 2003 64bit SP2 operating system with Virtual Server 2005 R2 SP1 64bit. All of our virtual servers run Windows Server 2003 SP2 32bit and are hosted by one of the three physical servers.

We have a three node Microsoft Network Load Balancing cluster, consisting of three virtual servers, each on a different one of the three physical servers. The three virtual servers are clones, use NewSID.

Each of the physical servers has a single physical NIC. All three physical servers connect to the same subnet.

Each of the NLB node virtual servers is configured with two virtual NICs. The NLB cluster is configured in unicast mode using one of the two NICs on each virtual server.

Heres the problem: everything works OK until one of the virtual servers is rebooted (which one doesnt matter). After the rebooted server comes back, connectivity is lost to the NLB NICs on the other two virtual servers for about 10 minutes.


1)      NLB Query from outside cluster reports nodes 1, 2, and 3 as converged.
2)      Reboot node 2.
3)      While node 2 is down, NLB Query from outside cluster reports nodes 1 and 3 converged, as should be the case.
4)      When node 2 comes back, NLB Query from outside cluster momentarily reports nodes 1, 2, and 3 as converted.
5)      Within a few seconds, NLB Query from outside the cluster reports node 2 converted. It gets no reply from nodes 1 and 3. Pings to the NLB NICs on nodes 1 and 3 get no response. Outside connectivity to the NLB cluster on nodes 1 and 3 is lost.
6)      After about 10 minutes, connectivity to the NLB NICs on nodes 1 and 3 is restored. NLB Query from outside the cluster reports nodes 1, 2, and 3 as converged. Everything is fine again.

However, during the 10 minutes when connectivity to the NLB NICs on the two nodes is lost, a NLB Query command executed directly on one of the cluster nodes reports that the cluster is converged with nodes 1, 2, and 3.

This seems to indicate that the nodes can communicate with each other during the time that outside connectivity is lost and raises the question of whether the problem is in the NLB networking layer or in the virtual server networking layer.

Any ideas what the problem is?
Question by:psyche6
    LVL 2

    Accepted Solution

    We need to add static ARP entries for the NLB 'MAC' address on our switches here, just a thought, have you done this?

    If they aren't there, you should be seeing problems when all 3 servers are up, not just when 1 is rebooted, but it might be worth looking at.

    Author Comment

    We found the problem. This is a little like saying "the butler did it" .... but the problem turned out to be the virus protection software. We use Trend Micro across our network. While working on this problem, one of us noticed the Trend Micro driver on the TCP/IP properties of the NICs. This jogged our memory that the only difference between the production cluster and the test cluster was that Trend Micro hadn't been loaded on the test cluster. We unloaded Trend Micro from the production cluster - problem solved. We loaded Trend Micro onto the test cluster - problem reproduced.

    We are working with Trend Micro support to document this bug.

    Question closed.

    PS: in working with ths problem, we discovered that updates in Windows 2003 SP1 and SP2 allow setting up NLB nodes in unicast mode on servers with one NIC - rather than requiring duel NICed servers to use unicast. See Microsoft KB898867

    Featured Post

    Better Security Awareness With Threat Intelligence

    See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

    Join & Write a Comment

    Lets look at the default installation and configuration of FreeProxy 4.10 REQUIREMENTS 1. FreeProxy 4.10 Application - Can be downloaded here ( 2. Ensure that you disable the windows fi…
    Let’s list some of the technologies that enable smooth teleworking. 
    After creating this article (, I decided to make a video (no audio) to show you how to configure the routers and run some trace routes and pings between the 7 sites…
    Get a first impression of how PRTG looks and learn how it works.   This video is a short introduction to PRTG, as an initial overview or as a quick start for new PRTG users.

    755 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    21 Experts available now in Live!

    Get 1:1 Help Now