[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1916
  • Last Modified:

bonding on different NICs

I have an unexpected (at least to me) issue with linux bonding.
I have a couple linux boxes. Each of them has following NICs
Ethernet controller: Intel Corporation 82541GI/PI Gigabit Ethernet Controller [e1000]
Ethernet controller: Broadcom Corporation NetXtreme BCM5701 Gigabit Ethernet [tg3]

I configured bonding over them. If only one nic is plugged, everything is fine. If I plug second, 99% of packets is loosed.
I'm sure bonding is configured OK, same config works for other machines, but those have same NICs.
The system is debian with custom build kernel. Also tried with newest kernel from RHEL4up3 - no difference.

Anybody heard about any issues with linux bonding on different NICs?
0
ravenpl
Asked:
ravenpl
  • 9
  • 8
  • 2
  • +1
4 Solutions
 
pablouruguayCommented:
change the NIC. maybe the driver is wrong.
0
 
ravenplAuthor Commented:
I know I can change NICs. But that not the question.
Each driver separatelly is fine, hence maybe they do not cooperate.
The question is: have anybody encountered such thing? Have anybody any documentation that mentions that?
0
 
Gabriel OrozcoSolution ArchitectCommented:
I think this is a kernel driver issue. Intel module has been updated very often due minor problems with it.

why not post the error in the kernel list?
http://www.kernel.org/pub/linux/docs/lkml/reporting-bugs.html
0
Free Backup Tool for VMware and Hyper-V

Restore full virtual machine or individual guest files from 19 common file systems directly from the backup file. Schedule VM backups with PowerShell scripts. Set desired time, lean back and let the script to notify you via email upon completion.  

 
Gabriel OrozcoSolution ArchitectCommented:
97.)        What is the Linux tg3 driver?

To better support users, Broadcom has been actively supporting, maintaining, and testing the in-kernel tg3 driver for well over a year. Broadcom has officially released the tg3 driver as a package, and the tg3 driver is now the Linux driver that Broadcom will support for the NetXtreme product line. Accordingly, Broadcom will discontinue support for the bcm5700 driver and no longer provide updates.

The tg3 driver package released by Broadcom is based on the lastest in-kernel tg3 driver with some added compatibility code to make it backwards compatible with most 2.6 kernels and some 2.4 kernels (generally newer than 2.4.24). If you are using the latest upstream kernel from www.kernel.org, you generally do not need to download the tg3 driver package from Broadcom as the latest upstream kernel has the latest tg3 patches.

There are a few minor differences to be aware of if you are migrating from the bcm5700 driver to the tg3 driver. The tg3 driver does not support the Broadcom proprietary load balancing software module known as BASP. The Linux bonding driver and 8021q driver provide similar functionalities and can be used with tg3. BASP will also be discontinued. The tg3 driver also does not support module parameters to configure the device (line speed, flow control, ring sizes, etc) but relies on standard Linux utilities such as ethtool. Other than these differences, the two drivers are very similar in terms of hardware support, robustness, and performance.

http://www.broadcom.com/support/ethernet_nic/faq_drivers.php

other source says:
--------------------------------------------------------------------------------------------------------------
Start by trying the HP supplied 'bcm5700' driver instead of the RH provided 'tg3' driver (  http://h18000.www1.hp.com/support/files/server/us/download/22318.html  ).

If that still fails, also look at the 'HP tested bonding driver' ( http://h18000.www1.hp.com/support/files/server/us/download/22271.html ).

Also, what sort of switch do you have the other end of the network cables plugged in to? Have you set up (or does the switch auto-detect) trunking for the two ports?
0
 
ravenplAuthor Commented:
I still don't think it's e1000 driver - I have configuration with two intels - work flawlesly. Same for two tg3's.
But I'll try the bcm5700. I should be back in few days.
0
 
wnrossCommented:
Are you trunking or doing failover?
0
 
ravenplAuthor Commented:
load balancing(trunking) - balance-rr mode.
0
 
wnrossCommented:
I'm just double checking some things, but can you check what MAC addresses you have set up for bond0?

Even better, can you just dump /etc/network/interfaces and mask out your IP addresses to this question?

Also, do you have jumbo packets enabled on either of the slave interfaces?
0
 
ravenplAuthor Commented:
It's normal bond configuration. None of ethX have IP assigned, bond takes MAC from first slave(eth0)

> Also, do you have jumbo packets enabled on either of the slave interfaces?
What is that?
0
 
wnrossCommented:
Packets with MTU > 1500
0
 
wnrossCommented:
While you're grabbing the interfaces file, can you post dmesg and /var/log/messages here?
Obviously trim them down to the last 10-30 lines relating to you bringing up bond0, eth0 and eth1
0
 
ravenplAuthor Commented:
Unfortunatelly I hve no time right now. All I can say is that MTU is set to 1500 nd nothing strange in dmesg/messages (nothing else than on box where bonding works fine).
I'll get back here later...
0
 
wnrossCommented:
Ok.
0
 
ravenplAuthor Commented:
Once again gentoo appeard to rock. After changing kernel to gentoo release, it started work.
However strange thing - no speedup. I can see frames are sent/recived via both ifaces - but no speedup.
Again, on same LAN segment are servers with same NICs - speed almost doubled.
Any thoughts?
0
 
wnrossCommented:
Sounds like the cards are set for failover and not trunking...oh, any chance this box is
using a different switch than your other machines where bonding is working?

Also, what do you have in
- /etc/modules.autoload.d/kernel-2.6
- /etc/conf.d/net

Cheers,
-Bill
0
 
ravenplAuthor Commented:
It's debian system - not  gentoo ;)

:~$ cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v2.6.5 (November 4, 2005)

Bonding Mode: load balancing (round-robin)
MII Status: up
MII Polling Interval (ms): 1000
Up Delay (ms): 0
Down Delay (ms): 0
#next are two up eth ifaces

:~$ cat /etc/network/interfaces
auto lo bond0
iface lo inet loopback
iface bond0 inet static
address 192.168.1.105
netmask 255.255.255.0
gateway 192.168.1.1
up ifenslave bond0 eth0 eth1
down ifenslave -d bond0 eth0 eth1
0
 
wnrossCommented:
My mistake :) I interpreted your last message as having switched to gentoo.

Ok, try setting miimon to off, there could be a confilct in the MII reporting back to the bonding driver:

/etc/modprobe.d/arch/i386
alias bond0 bonding
options bonding mode=1 miimon=0 downdelay=0 updelay=0

Also, did you check your switch to see if it was configured for trunking?

Cheers,
-Bill
0
 
ravenplAuthor Commented:
Switch is OK, as I said before, there are other servers which works fine.
Let me ask first - miimon is a way for discovering whether particular ethX is up or down. Since both eth are up and no link failures discovered - changing it to 0 is pointless. What more - I'm not sure if use_carrier would work as desired. But I can give it a shot - on monday...
0
 
ravenplAuthor Commented:
Nothing changed after switching from miimon to use_carier. I'm giving up here and spliting some points.
Thanx for trying to help ;)
0
 
wnrossCommented:
No problem, you might want to resume this question in the kernel development forums, or try to build a modified kernel.  

Oh, here's a final thought, you tried some different kernels, perhaps doing a diff on the ".config" files might give some insight

Cheers,
-Bill
0

Featured Post

Hire Technology Freelancers with Gigs

Work with freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely, and get projects done right.

  • 9
  • 8
  • 2
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now