Solved

HP network teaming fails after server reboot

Posted on 2010-08-29
13
14,923 Views
Last Modified: 2012-05-10
Hi.

I am having an issue with network teaming on a HP Proliant DL380 G3 server (running Windows Server 2003 R2 Standard), connected to a CISCO C2950 switch.  

Teaming works fine until the HP server is rebooted.  When the server is rebooted, the network team fails to initialize.  If I log on to the server at the console, open network connections and look at the teaming connection, I either see a message saying that the cable is unplugged, or the acquiring IP address message (despite the server having a static IP).  

If I get the first message (network cable is unplugged) and I leave the network connections window long enough, the second message will appear.  The only way I can restore the teaming interface is to disable it through the network connections window, and then re-enable it.  The interface comes up instantly and all is fine until the next reboot.

In all cases, the individual interfaces (i.e. the team members) display "connected" in the network connections window.  The only item that is selected in the interface properties window of the team members is the HP Network Configuration Utility.

Before I started this morning, I was getting the following error in the Windows System log:

Event Type:      Warning
Event Source:      CPQTeamMP
Event ID:      434
LBSRV03: PROBLEM: A non-Primary Network Link is not receiving. Receive-path validation has been enabled for this Team by selecting the Enable receive-path validation Heartbeat Setting.  ACTION: Please check your cabling to the link partner. Check the switch port status, including verifying that the switch port is not configured as a Switch-assist Channel. Generate Broadcast traffic on the network to test whether these are being received. Also make sure all teamed NICs are on the same broadcast domain. Run diagnostics to test card. Drop the NIC from the team, determine whether it is receiving broadcast traffic in that configuration.

However, with my current configuration, I am now getting the following error in the System log:

Event Type:      Warning
Event Source:      CPQTeamMP
Event Category:      None
Event ID:      461
Description:
Team ID: 0
Aggregation ID: 1
Team Member ID: 1
 PROBLEM: 802.3ad link aggregation (LACP) has failed. ACTION: Ensure all ports are connected to LACP-aware devices.

I have attached a file showing the output of a "show run" and "show etherchannel 3 detail" command on the CISCO switch.  Etherchannels 1 and 2 are working.  These are connected to two other servers.  The extra config on the Etherchannel 3 and FE0/5 and FE0/6 interfaces was added this morning to try and cure the issue.  

All Etherchannels and interfaces are in the native VLAN on the switch.  I tried setting the VLAN ID in the HP config utility to VLAN1 but this didn't seem to make any difference.

I believe I have installed the latest drivers and firmware for the network cards from the page below:

http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareIndex.jsp?lang=en&cc=us&prodNameId=3288130&prodTypeId=15351&prodSeriesId=316529&swLang=8&taskId=135&swEnvOID=1005#11395

I installed the 14.0.0.7 HP NC-Series Broadcom driver and the 2.1.5.7 broadcom firmware update.

The version of the HP Network Configuration Utility is 10.00.0.12.

I have been searching the net all morning and despite trying several things, the issue remains.  I don't really know what else to try and if anyone has any suggestions, they would be greatly appreciated.

Let me know if you need any other info.

Thanks, Shaun


HP Network Configuration Utility settings:

 hp01.tif

 hp02.tif

 hp03.tif

All offload settings disabled)

 hp04.tif

 hp05.tif

 hp06.tif

 hp07.tif
ciscooutput.txt
0
Comment
Question by:buck57005
  • 5
  • 4
  • 3
  • +1
13 Comments
 
LVL 27

Expert Comment

by:Steve
ID: 33554520
are you trying to set teaming up on the server and the switch?
You should  only use aggregation the server OR switch.
0
 
LVL 55

Expert Comment

by:andyalder
ID: 33557390
Have you tried taking out the "switchport mode active" and "spanning-tree portfast" from channel-group 3 to make them similar to the two working channel-groups?
0
 
LVL 7

Expert Comment

by:D_Vante
ID: 33559123
What version of the HP Network teaming software are you using?
 
0
 
LVL 1

Author Comment

by:buck57005
ID: 33561974
Hi.  Thanks for your comments.


Totallytonto, I am configuring both the switch and server so as to load balance across the two Ethernet ports.  As far as I understand it, you need to have a managed switch and software on the server configured to load balance across a connection.  I might be wrong but I think this is the reason that you cannot configure teaming on a cheap £5 jobbie switch.  Certainly the way I am trying to configure teaming on this server is the same (as far as I can see) as on the working server.  That said, any alternative suggestions are always welcome.


D Vante, the only software I am using is the HP Network Configuration Utility which is version 10.00.0.12.  I have noticed when I click on About in the config utility that the Network Teaming Intermediate Driver (NTID) is version 10.00.00.0.


Andyalder, initially, the Etherchannel 3, and the FE05 and FE06 interfaces matched the working interfaces identically.  However, I still got the same issue.

Just to satisfy my curiosity, I just removed the following commands from the FE05 and FE06 interfaces:

switchport mode access
spanning-tree portfast

I also removed the following command from the Etherchannel 3 interface:

switchport mode access

I rebooted the server and got the same behaviour.  However, I have an extra event in the System log on the server:

Event Type:      Information
Event Source:      CPQTeamMP
Event Category:      None
Event ID:      439
Description:
LBSRV03: PROBLEM: A non-Primary Network Link is being Closed. This is typically because of a PnP action, possibly it was reconfigured through Network-Properties or through HP Network Configuration Utility? Possibly it was Disabled? Possibly it is being dropped from a Team or the Team is being Dissolved? ACTION: No action is required if the described behavior is expected. Otherwise, investigate the PnP reason, possibly re-enable the miniport.

This is in addition to the 461 events that I listed above.

I also just noticed that there are two 462 events in the System log:

Event Type:      Information
Event Source:      CPQTeamMP
Event Category:      None
Event ID:      462
Description:
Team ID: 0
Aggregation ID: 0
Team Member ID: 0
 802.3ad link aggregation(LACP) has been restored.

The other event is identical apart from it references team member 1.  I checked back in the System log and these events occur each time the server is booted and no negative events related to teaming are displayed once these events are generated.

It sounds like LACP is unsuccessful initially, and then succeeds, but the teaming connection in Windows never recovers.  

Does anyone know if it's possible to script the disabling and enabling (or possibly a repair) of a network connection using vbScript.  I'm wondering if that may be a possible workaround.  Publish a computer startup script and put a delay of 2 or 3 minutes in to allow the LACP process to succeed.

Thanks, Shaun
0
 
LVL 27

Expert Comment

by:Steve
ID: 33562596
You can use the netsh command from the command line to perform most functions on network cards.

Also, you dont normally set teaming up on the switch AND the server. Just one or the other.
If you set both you'll find that the server and te switch are fighting to control the traffic and will fail.
0
 
LVL 1

Author Comment

by:buck57005
ID: 33562777
Hi Totallytonto.

Thanks for your message.  

I probably need to clarify, I am enabling teaming on the server, and the majority of the config is done on there.  However, surely the switch needs to be LACP aware otherwise how would it know how to load balance the traffic?

Cheers
0
Control application downtime with dependency maps

Visualize the interdependencies between application components better with Applications Manager's automated application discovery and dependency mapping feature. Resolve performance issues faster by quickly isolating problematic components.

 
LVL 27

Expert Comment

by:Steve
ID: 33562970
What kind of teaming are you setting up on the server?
-Transmit load balance with fault tolerance
-network fault tolerance
-switch assisted loadbalancing with fault tolerance?
0
 
LVL 55

Expert Comment

by:andyalder
ID: 33563056
He's using 802.3ad = LACP, it's a form of switch assisted load balancing. It's initiated from the switch since that's active and the server is set to automatic (passive). Could swap them around so that the switch is passive and the server active in the team setup.
0
 
LVL 1

Author Comment

by:buck57005
ID: 33566090
Would that be by using the following command for the FE05 and FE06 interfaces?

channel-group 3 mode passive

Cheers
0
 
LVL 55

Expert Comment

by:andyalder
ID: 33566256
Yes, that's right, with 'channel-group 3 mode passive' the seitch sits there waiting for the server to initialise the LACP channel group.
0
 
LVL 1

Author Comment

by:buck57005
ID: 33566303
I will certainly give that a try tonight and will let you know how it goes.

Cheers
0
 
LVL 55

Expert Comment

by:andyalder
ID: 33567162
Don't forget to change the server setting to active in dropdown box at hp02.tif
0
 
LVL 1

Accepted Solution

by:
buck57005 earned 0 total points
ID: 33603238
Andyalder, I tried your suggestion setting the switch to passive and then altering the settings in the HP config tool.  Unfortunately, I got the same issue.

Then it certainly dawned on me that I hadn’t compared driver versions between this server and the working one.  What do you know, the versions were very different.

I downgraded the problematic server to 9.52 for the NIC and version 8.40.0.24 of the HP network configuration tool.  Both for these were available as links on the HP site.  I had to go through the motions of downloading the latest driver and then on the final screen there is a tab which says Revision History.  You then get links to all of the old versions.

I also removed all of the extra config for interfaces FE05, FE06 and port-channel 3 (i.e. mirroring the working setup) on the switch.  A combination of this and the downgrade seems to have cured the issue.

This is very odd as I only installed the other server a few months back and surprised I didn’t use the latest drivers then.  

If anyone else experiences this issue, then I would recommend experimenting with later drivers as there are many newer drivers than the ones I am using.  Now I know the fix, I may experiment moving forward.

Thanks guys for your assistance.

Shaun
0

Featured Post

What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

Join & Write a Comment

this article is a guided solution for most of the common server issues in server hardware tasks we are facing in our routine job works. the topics in the following article covered are, 1) dell hardware raidlevel (Perc) 2) adding HDD 3) how t…
PRTG Network Monitor lets you monitor your bandwidth usage, so you know who is using up your bandwidth, and what they're using it for.
Viewers will learn how to connect to a wireless network using the network security key. They will also learn how to access the IP address and DNS server for connections that must be done manually. After setting up a router, find the network security…
After creating this article (http://www.experts-exchange.com/articles/23699/Setup-Mikrotik-routers-with-OSPF.html), I decided to make a video (no audio) to show you how to configure the routers and run some trace routes and pings between the 7 sites…

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now