Link to home
Start Free TrialLog in
Avatar of tau161
tau161Flag for United States of America

asked on

Windows 2003 w/ NIC Teaming Can't Send/Receive Pings

Greetings all. Here are basics:
Dell PowerEdge 2900
Windows Storage Server 2003 x64 SP2
Dual Processor Dual-Core Xeon 2.66GHz
16GB RAM
(2) Broadcom NetXtreme II BCM5708C NICs (supports 802.3ad Link Aggregation)
Dell PowerConnect 2716 GbE 16-port switch (supports 802.3ad Link Aggregation)
Multiple RAID arrays, multiple shares.
Heavy multimedia file-sharing/editing. So, we would like to make use of Link Aggregation.

I am seeing some odd behavior. The closest EE question I've found is:
https://www.experts-exchange.com/questions/21851057/Windows-2003-cant-ping-windows-2003.html

Here is the setup. I setup both NetXtreme II NICs as a LAG Team using BACS (Broadcom Advanced Control Suite.) Set the team type as 802.3ad Link Aggregation using LACP.

On the 2716 switch I selected two ports and set them as a Link Aggregated Group and linked up those ports with the NIC team on the server.

Here is the behavior.
NOTE: NIC Team has the MAC address of NIC#1
- The NIC Team doesn't automatically get itself an IP from the DHCP server at startup. It either says the DHCP server cannot be reached, _or_ it gives itself the good 'ol favorite 169.254.xx.xx address - equally useless.

__ But here are the defining symptoms (or so I believe): __
- If I disable NIC#1 in Network Connections, and ipconfig /renew, then the NIC Team finally gets an IP.
- This is the same for both dynamic IPs and static IPs based on the NIC Team's advertised MAC.
- After I re-enable NIC#1, pings continue to be successful in both directions .
- If I then disable NIC#2 in Network Connections, Pings continue successfully still.
- If I then re-enable NIC#2, pings start failing.
- After that, Any combination of disable/enable + release/renew, is met with "unable to contact your DHCP server." or 169.254.xx.xx
- In the absence of the NIC Team, either GbE port alone functions fine.


I tried replacing DHCP server (in case it was going bad). Same behavior.
I tried select SLB (Smart Load Balancing) as a Team type. Same behavior, successful after the first change, but dies after that.
Tried using differect LAG group on switch with different ports, same behavior.
Made sure to test different cables for both NICs.

Also (just an annoyance really), after each time adding/deleting a NIC team. Network shares on the server become inaccessible, and must be re-shared to be usable.

Can anyone shed some light on this?
Avatar of 65td
65td
Flag of Canada image

Network monitor may aid in seeing what's going on:
http://www.microsoft.com/DOWNLOADS/details.aspx?FamilyID=f4db40af-1e08-4a21-a26b-ec2f4dc4190d&displaylang=en#filelist

Dell Broadcom doc:

http://support.dell.com/support/edocs/network/r35278/broadcom%20nic%20teaming_1.1_final.doc

Had an issue with HP teaming (fail on Fault) when receive side scaling (RSS) was enabled,, fail team over then fail back, no ping, unchecked RSS, it always works.
Dell RSS PDF:
http://www.dell.com/downloads/global/products/pwcnt/en/nic_broadcom_57710_refsheet.pdf

Other interesting web site:
http://msmvps.com/blogs/thenakedmvp/archive/2007/03/12/broadcom-toe-dell-isa-2004-and-my-headache.aspx
Avatar of tau161

ASKER

Hello 65td,
  Thank you for your reply. I'll try it when I get back in the office. I removed the team and reverted the server to its default state before leaving for the evening. I'll be back in the office on 3-11-09.

However, I have an update. New behavior and a possible solution. I don't know if there are risks. If it is a solution, I would still like to find an underlying explanation as to why it wasn't working before.

- I tried unplugging the Teamed NICs from the LAGed ports on the PowerConnect 2716 switch, and plugging them into regular non-LAGed ports, and suddenly everything worked.
- Disabling/Enabling either NIC didn't interrupt service.
- It worked with both SLB and 802.3ad Link Aggregation - (I think SLB doesn't require switch support, but the other _does_ ).
- Now that I have a configuration that actually seems to work, I have noticed that the NIC Team doesn't necessarily keep the MAC address from NIC#1 from reboot to reboot.
- I will still try the RSS fix when I get back.

Your thoughts?
Avatar of tau161

ASKER

Ok. I checked, and Receive-Side Scaling was already off during this behavior, so it didn't play a role.
Also, TOE (TCP Offload Engine) is not enabled, so that wasn't it either.

So why is the NIC Team not working with LAGed ports on the switch? The BACS setup indicated that LAGed ports were necessary for the NIC Team to function properly. Could it be Dell's implementation of LAGs on the PowerConnect? Would I see different behavior with some other switch that supports 802.3ad?

I like the fact that I have it working now. However, I strongly dislike solutions that I can't give an explanation for.

Any insights are welcome!
ASKER CERTIFIED SOLUTION
Avatar of tau161
tau161
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial