NLB Documentation - Unicast w/ 2 NICs on same subnet w/ 2 default gateways

Hello Everyone,
I have a client who I suspect has NBL Unicast port flooding going on and it is proving quite tricky to prove.  My SQL queries are timing out immediately and I am getting NetLib errors (VB losing the network flow from what I can tell).  Anyway, this client has 9 NLB servers plugged into an old CatOS Cisco core switch.  There are 3 clusters, all using Unicast with 2 NICs.  One NIC has the cluster VIP and a dedicated IP.  The other NIC has a dedicated service console IP on the same subnet.  Both NICs have default gateways.  
From my experience, I have always been told that you don't put 2 DGs on the same subnet.  You will get bad results when both try to register in DNS, when Client for MS Networks advertises the server, and when File and Print Sharing for MS advertises itself.  
I need solid documentation explaining that this is bad and why.  I have found this: http://download.microsoft.com/download/1/7/0/170690b4-87d2-402c-8ec9-6b76c5db4bdf/nlbbp.doc
and this:
http://www.eggheadcafe.com/software/aspnet/31078234/nlb-cluster-adapter-wit.aspx

But the instructions lack a degree of clarity.  In the first doc, it tells you not to put 2 DGs on NICs of the same subnet but it is under the Windows 2000 heading, not the Server 2003 heading.  There is room for interpretation here and I am looking for a rock-solid document.  
The second link is a blog post and is not something I will walk into a client's site with and say "hah, I told you it wasn't set up properly".  

I am hung up on a few things here:
-I think Unicast NLB is causing port flooding but cannot "show" this to them since they don't have the ability to monitor Unicast traffic on their core...I also cannot point out what levels of Unicast flooding would cause issues.  
-I am pretty sure they should not be using a second NIC on the same subnet...and if they choose to do so, they should CERTAINLY not have 2 default gateways on the same subnet.  
-I know they don't have any special configuration in their ARP routes or CAM tables but the document that tells me they need this is specific to Multicast NLB.  This leads me to believe that Unicast NLB does not require this...

Can someone please point me to some definitive answers I can show this client?  I have ~200 angry users and some frustrated techs who don't think they have a networking problem...I have ~500 other client sites using this software (minus the Unicast NLB) who are having no issues.  
The only other clients I have who load balance TS (they don't like Citrix...) are using either hardware load balancing or Multicast IGMP NLB.  
LVL 19
BLipmanAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

TheCleanerCommented:
I don't know a specific way to check for unicast flooding on an old CATOS switch....maybe you can do port mirroring and put a laptop with either Wireshark or ethereal on there and see for sure.  I'm sure you are getting flooded, but to what extent is really the question.

You could try this;  http://support.microsoft.com/kb/247297

Another option would be to connect the NLB nics to another hub or small L2 switch and then give it a single uplink back to the main switch it was on.  Not exactly PHY redundancy but should work to load balance still.
0
ChiefITCommented:
The errors I have seen with NICS pertain to the following settings on switches, servers, and routers. The settings are Spanning tree, portfast, Multicast/unicast,  Mode of operation for switches and routers, A faulty service pack (2003 server SP1). If not in the correct combination, any of these will cause NIC flooding and intermittant communications with 2003 server services (like DHCP, DNS, WSUS ect...).
__________________________________________________________________
Putting NLB over a switched network into perspective: (Foundation of a topology)
http://www.experts-exchange.com/OS/Microsoft_Operating_Systems/Server/2003_Server/Q_23037760.html

Preventing NIC flooding caused by NLB:
http://technet2.microsoft.com/windowsserver/en/library/bf3a1c95-f960-4ed3-b154-3586631fb0061033.mspx?mfr=true
_________________________________________________________________
A little explaination of spanning tree and portfast.
http://itt.theintegrity.net/pmwiki.php?n=ITT.Spanning-TreeAndPortfast
(NOTE: Portfast is necessary for XP clients. XP clients will time out otherwise.)

An Event error usually associated with a Spanning tree portfast problem:
Event ID 5719, spanning tree portfast:
http://support.microsoft.com/kb/247922
____________________________________________________________________
The differences between Unicast and Multicast modes:
http://support.microsoft.com/kb/291786
(The server requires unicast mode to work with dual NICS)
______________________________________________________________________
2003 server Service Pack 1 has a discrepancy that can even cause a single NIC to be flooded.
http://support.microsoft.com/default.aspx?scid=kb;en-us;898060

Service pack 1 problem sometimes has problems with DHCP, (if applicable), and is also sometimes associated with Event 333.
http://www.experts-exchange.com/OS/Microsoft_Operating_Systems/Server/2003_Server/Q_23008324.html
___________________________________________________________________________

Microsoft's fix: (2000 server)
http://support.microsoft.com/kb/261957
___________________________________________________________________________
Mode of operation:
Cisco also has a quirk. Cisco routers and switches need to be on the exact same mode of operation as one another. Example: If you have a Cisco switch that is on 100mb full duplex and a router that is on Auto, you would think these two would communicate with one another. Cisco switches and routers have to be the same as one another. No apparent errors in DCdiag or event reports will allow you to see this error in action.
________________________________________________________________________________

You say that about 200 people see intermittant comms. If this is acting up within one of your three clusters, I would be willing to bet the problem lies in the mode of operation, not the unicast setting. No appearant errors with the mode will be visible in event viewer or DCdiag reports. However, you will see intermittant comms especially on DNS. If the problems occur on xp boxes and 2003 servers, I would look into portfast.

THECLEANER suggested a L2 setup to take the two nics and put them on a second switch. So, did the article that provides a foundation of NLB on a switched network. I too think it would be a good idea to explore the second layer.

I hope this helps.  
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
BLipmanAuthor Commented:
Thanks guys, I ended up putting in a hardware load balancer "for them to try..."; everything is good as of noon on Friday.  I will see this week with full load if my errors are totally gone.  I will explore the other links as well.  I know they have some NIC to switch port speed/duplex issues.  
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Windows Server 2003

From novice to tech pro — start learning today.