Internal routing on SLES 12 SP2 machine with multiple network interfaces ... packets go to the wrong interface

Frank Helk
Frank Helk used Ask the Experts™
on
I'm experiencing a Linux routing problem.

Environment is SLES 12 SP2, running on some HP server machine with 8 physical, used network interfaces, running in a non-internet local network.

Most physical network interfaces (eth0 ... eth3 and eth5 ... eth7) have (local unique) static IP addresses in non-overlapping networks, and the routing table looks ok. The interface eth4 is on DHCP.

The problem is that sometimes packets seem to be sent over the wrong interface - a packet that is expected to fly thru eth6 is spit out on eth0. This happens erratic and causes to application software (managing measurement data) to loose the data stream after max ~15 minutes .

As fa s I can see,

  • the exit interface of the wrong-routed packets is always eth0
  • there are packets of at least 2 interfaces routed wrong
  • the configuration of eth0 (viewed by YaST and by inspection of /etc/sysconfig/network/ifcfg-eth0) shows no IP addresses from the other interfaces networks

If I take down eth0, the application runs smooth (but that's only acceptable for testing matters).

If I record the network traffic of the network addresses for eth2 (tcpdump), I find i.e. suspicious ARP requests originating from address on eth6 with an originating MAC address of eth0.

Any idea what happens here ?
Any idea how to fix it ?

P.S.: Due to some policies demands, I can't do any driver etc. updates on the system. Same appies to ideas like "do DHCP on all interfaces" ... I can't change that.
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
David FavorFractional CTO
Distinguished Expert 2018

Commented:
Routing works by metric analysis.

So... if packet flow over eth0 is faster than eth6, then when eth0 is faster, packets will flow over eth0.

If your goal is fastest throughput, you will simply let your routing system handle this for you.

One fix is to place manual/host routes between eth6 + your internal network.

In other words, you'll make eth6 (rather than eth0) the gateway for your internal network(s), so packets will flow over eth6 independent of how slow eth6 packet flow might be at any given moment.
Hmmm - i'm somewhat puzzled.

In this system each network interface connects to a separate, static network segment. There are no alternate paths ... the respective misdirected packets will never reach their destination. The eth6 interface is immediately (no routers) connected to the target network segment ... no hops.

To spit the packets out on eth0 is not an unwanted alternative, it's simply wrong ... as far as I understand routing tables, the eth0 path would simply not match the network mask.
Duncan RoeSoftware Developer

Commented:
When you have multiple ethernet cards, it's a good idea to configure by MAC address rather than by ethx. This guards against cards initialising in a different order from one reboot to the next.
I use an inferior solution on my Linux router for reasons that I won't detail here, so I can't give you further detail
Success in ‘20 With a Profitable Pricing Strategy

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden using our free interactive tool and use it to determine the right price for your IT services. Start calculating Now!

Hmmm ... some time ago I've played around a bit with network interfaces and MAC adresses ... for me it looks like

/etc/udev/rules.d/70-persistent-net.rules

ensures a fixed binding between MAC address and eth... as long as I don't exchange hardware items, doesn't it ? And even it I change the hardware, the remaining hardware would still be bound correctly.

In my case, the routing changes on the running system, with definitely no configurations changeing around ...
Looks like I'm a bit closer to the problem, but it still hides behind a corner, I fear ...

I'm recording part the network traffic on the respective IP networks, basically with
tcpdump -iany -G 600 -n -e -w /data/tcpdumps/tcpdump.%F_%H%M%S.pcap net 172.25.22.0/24

Open in new window

(more options applied to i.e. zip the dumps afterwards ...)

I've found repeatedly occuring sequences like these (Wireshark screenshot):
Wireshark ScreenshotThese occur on two interfaces of this server, both in the 172.25.22.0/24 range, and the unreachable address is the respective address of the server itself on that interface.

The network load on these interfaces is low, and the "outage" is very short (in tis case about 0,015 ms) so most of these "outages" might pass harmess, but I don't know what happens if an outbound packet hits this timespan ... I suspect it would then be routed to another interface ?

P.S.: The outages occur on 2 interfaces, but not on two interfaces at the same time.
Additional observation:

The "Destination unreachable" message sems to occur on ALL network interfaces, including the loopback interface 127.0.0.1.
I'm not really sure what caused the problem, but I solved it anyhow :-)

Looks like it was not the best idea to shove 4 logical networks over one physical switch without logically segmenting the switch. Should work from my point of view, but didn't work well in reality. And it seemed to be no linux routing problem at all ...

After I did that traffic segmentation, the connection losses of the application software were gone.

The "Destination unreachable" packets remain, but I've learned that they're unrelated to my problem. I'll dig ito that on another day.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial