Centos Linux NIC bonding for load sharing

sunhux
sunhux used Ask the Experts™
on

I followed the instructions in the link below & used mode=5
  http://wiki.centos.org/TipsAndTricks/BondingInterfaces

Q1:
Is mode=5 the right mode to use for link load sharing+redundancy

Q2:
After configuring on the servers' ports, "ifconfig -a" showed bond0
& eth0 are Up  but eth1 is not showing as Up.  When I connect only
eth0 to a link aggregated Cisco 2960 switch port, I could ping elsewhere
but if I connect only eth1, it could not ping (& ifconfig eth1  showed
eth1 is not UP.  What went wrong?

Q3:
What would be the setting (kindly provide exact commands) on the Cisco 2960
switches' ports for  the 2 NIC ports on my server to connect to obtain load
sharing & redundancy?

Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
President, IT4SOHO, LLC
Commented:
OK, first of all you've chosen to configure your "bond0" to use transmission load balancing... that means that if you're trying to ping the same location, it'll use the same interface (eth0 or eth1)

Please read http://www.linuxhorizon.ro/bonding.html to learn more about the bonding modes.

You were probably looking for mode 6 - ALB (adaptive load balancing), but several of the modes are dependent upon being able to make adjustments to the NIC -- and some NICs just aren't compatible. Look in the boot logs for when the network is started -- if the NICs can't reset their MAC addresses to the same value, then there are some bonding modes -- including TLB and ALB -- that just won't work.

The fact that your system doesn't work when eth1 is the only connection leads me to believe either:
 a) eth1 was somehow not included in the bonding configuration (look in /etc/sysconfig/network-scripts/ifcfg-eth1 -- it should have 4 lines:

DEVICE=eth1
MASTER=bond0
SLAVE=yes
ONBOOT=yes
b) Your NICs aren't capable of being bonded, so the driver simply "chose" the first interface and didn't ever really "bind" them at all... again, look in your system logs during the boot phase -- specifically when the network is being brought up and the bonding module is being loaded by the kernel.

Finally, you shouldn't have to make any changes to the switch -- it should recognize the same MAC address on two separate ports and adjust accordingly without input from you...

I hope this helps! I use bonding on about 80% of the servers I manage...

For anyone reading this with a WINDOWS background, you may think of this technology as "trunking"... just so you know what we're talking about...

Dan
IT4SOHO

Steve JenningsSr Manager Cloud Networking Ops

Commented:
Thanks Dan

Author

Commented:

/etc/sysconfig/network-scripts/ifcfg-eth1  is present all these while
& it contains the following 4 lines plus others as below:

DEVICE=eth1
BOOTPROTO=none
HWADDR=00:14:C2:5C:9B:E3
ONBOOT=yes
MASTER=bond0
SLAVE=yes
USERCTL=no
TYPE=Ethernet

Do I need to remove any of the above lines?

When doing "service network restart" got the following logs on the console:

Bringing up interface bond0: Device eth1 has different MAC adcdr than
expected, ignoring.

So what can I do about the message I'm seeing above?
Exploring SharePoint 2016

Explore SharePoint 2016, the web-based, collaborative platform that integrates with Microsoft Office to provide intranets, secure document management, and collaboration so you can develop your online and offline capabilities.

Author

Commented:

Now if I comment out (by preceding with # ) the line
HWADDR in ifcfg-eth1, I don't get the above message
during service network restart but got a different message
this time, indicated below:

Bringing up interface bond0: tg3 device eth1 does not seem to be present, delaying initialization

What do I do next ?

Author

Commented:

Both eth0 & eth1 are physically present (seen from behind the Proliant G4 server)
& lspci output below showed the 2 Broadcom NIC :


00:00.0 Host bridge: Intel Corporation E7520 Memory Controller Hub (rev 0c)
00:02.0 PCI bridge: Intel Corporation E7525/E7520/E7320 PCI Express Port A (rev 0c)
00:06.0 PCI bridge: Intel Corporation E7520 PCI Express Port C (rev 0c)
00:1d.0 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #3 (rev 02)
00:1d.3 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #4 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB2 EHCI Controller (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev c2)
00:1f.0 ISA bridge: Intel Corporation 82801EB/ER (ICH5/ICH5R) LPC Interface Bridge (rev 02)
00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE Controller (rev 02)
01:03.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
01:04.0 System peripheral: Compaq Computer Corporation Integrated Lights Out Controller (rev 01)
01:04.2 System peripheral: Compaq Computer Corporation Integrated Lights Out  Processor (rev 01)
02:00.0 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge A (rev 09)
02:00.2 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge B (rev 09)
03:01.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 10)
03:01.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 10)
04:03.0 RAID bus controller: Compaq Computer Corporation Smart Array 64xx (rev 01)
05:00.0 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge A (rev 09)
05:00.2 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge B (rev 09)

Author

Commented:


the command "ifenslave -d bond0 eth0" is new to me
Top Expert 2015

Commented:
mode=6 requires fairly recent network adapters which can change MAC address while UP
you get immediate note in dmesg if it is not the case and you need to fall back to mode=1

RLB will need support from ethernet switch to balance between two ports.
Daniel McAllisterPresident, IT4SOHO, LLC
Commented:
OK, it looks then like your installation of eth1 is screwed up.

There's a kernel/network module that's loaded at boot-time that loads a specific NIC driver for your ethx interfaces. Look in the file /etc/modprobe.conf and remove the entry for eth1, then use rmmod to unload the NIC driver.

If you KNOW the driver for that card (e.g.: it's the same make and model as eth0), you can manually change the file contents so that eth1 is the same as eth0 -- but unless you know Linux kernel behavior well, your best solution is to remove the eth1 configuration and reboot -- the kudzu configuration wizard should setup eth1 for you during the reboot. (Yes, you can do it without a reboot -- it's just lazy and easy to reboot and have it done automagically).

Once eth1's drivers are loaded properly, you should start to get a decent status from the NIC -- and THEN it can participate in the bonding...

Good Luck

Dan
IT4SOHO
Daniel McAllisterPresident, IT4SOHO, LLC
Commented:
Ok... addendum to above -- it appears that I had more information available than I first thought...

So here's what the operative config files should contain:


< /etc/sysconfig/network >
NETWORKING=yes
HOSTNAME=your_hostname
GATEWAY=your_gateway_IP

< /etc/modprobe.conf >
alias bond0 bonding
options bonding miimon=100 mode=6
alias eth0 tg3
alias eth1 tg3


< /etc/sysconfig/network-scripts/ifcfg-eth0 >
DEVICE=eth0
BOOTPROTO=static
TYPE=Ethernet
ONBOOT=yes
SLAVE=yes
MASTER=bond0

< /etc/sysconfig/network-scripts/ifcfg-eth1 >
DEVICE=eth1
BOOTPROTO=static
TYPE=Ethernet
ONBOOT=yes
SLAVE=yes
MASTER=bond0

< /etc/sysconfig/network-scripts/ifcfg-bond0 >
DEVICE=bond0
ONBOOT=yes
IPADDR=your_IP_address
NETMASK=255.255.255.0

I hope this helps!

Dan
IT4SOHO

Author

Commented:

Thanks guys.  I rebooted the server & eth1 came up fine.  Now connecting
to eth1 or eth0 alone works & allows the server to ping to another addr.

Looks like reboot reloads the driver while "service network restart" doesn't.

Connected eth0 & eth1 to my Cisco 2960 switch's port 15 & 16 which have
the followg configurations :
interface g0/15
  channel-protocol lacp  ! Specify LACP instead of PAgP
  channel-group 1 mode active ! create the channel using active mode
  no shut
  exit
interface g0/16
  channel-protocol lacp  ! Specify LACP instead of PAgP
  channel-group 1 mode active ! create the channel using active mode
  no shut
  end
copy run start


While copying files from the server to a NAS, I spotted that port 15
has high bandwidth outgoing (with minimal incoming) while port 16
has high bandwidth incoming (with minimal outgoing) as seen when
I did "show int g0/15" & "show int g0/16"

Shouldn't mode=6 enable equal load sharing between both ports
ie I'm expecting incoming & outgoing to be comparable?


Top Expert 2015
Commented:
mode=6 has no influence on INCOMING traffic, it will round-robin outgoing if no better idea...
Daniel McAllisterPresident, IT4SOHO, LLC
Commented:
Incoming traffic cannot be under the control of the Linux system. (The telepathic, prescience and telekinesis capabilities of Linux were removed due to security concerns). :-)

Load balancing inbound traffic has to be a function of some other kind of load balancing (switch, DNS, etc.).

Furthermore, you won't usually find the kind of load balancing you're seeming to look for... If I'm moving files from one system to another and EACH has a bonded pair of 10/100 NIC cards, then whether I get 100 or 200 Mbps throughput will depend on the method/protocol chosen... If everything is being sent on one TCP connection, you'll only get 100Mbps of throughput -- but if you use multiple TCP connections, you'll get the full 200.... It's usually not a good idea to try to balance a single TCP connection across multiple interfaces (at least not without some kind of encapsulation -- but that's a whole different can of worms.)

I hope this helps to explain the behavior you're seeing....

Dan
IT4SOHO

Top Expert 2015
Commented:
mode=6 is appropriate for cisco link-aggregated ports.
it is more functional and auto-sensing than mode=5, namely it will survive your one-cable-lost exercise.

Author

Commented:
Wonderful, Thanks

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial