Link to home
Start Free TrialLog in
Avatar of fuats
fuatsFlag for United States of America

asked on

Ubuntu routing + VLAN + iptables = hair-loss

Inherited a network that has been grown up by several different people over the last 10 years.  It's pretty messy, and I'm trying to clean it up.  The current priority project is getting a VLAN functioning.  The switches know of the VLAN, and can handle the tagging part.  Everything is getting routing via a Ubuntu firewall/router with several NICs.  Some items documented as VLANs I've found are not really VLANs, but just class B addresses with varying third octets.

Kernel is 2.6.27

eth0 - External IP 1
eth0:5 - External IP 2
eth0:6 - External IP 3
...
eth1 - 10.10.1.1 (10.10.0.0/16)
eth2 - External IP 7
eth3 - External IP 8
eth4 - 10.10.200.1 (VLAN Trunk)
vlan172 - 172.16.172.1 (172.16.172.0/24, bound to eth4)
vlan173 - 172.16.173.1 (172.16.172.0/24, bound to eth4)
vlan109 - 10.10.109.1 (10.10.109.0/24, bound to eth4)
---------------------------------------ifaces-----------------------------------
iface eth4 inet static
      address 10.10.200.1
      netmask 255.255.255.0
      vlan_raw_device eth4
iface vlan109 inet static
      address 10.10.109.1
      netmask 255.255.255.0
      vlan_raw_device eth4
iface vlan172 inet static
      address 172.16.172.1
      netmask 255.255.255.0
      vlan_raw_device eth4
iface vlan173 inet static
      address 172.16.173.1
      netmask 255.255.255.0
      vlan_raw_device eth4

-------------------Abbreviated Firewall Script -----------------------------
#!/bin/sh

# I've removed all comments, and extraneous garbage that I don't feel is pertinent.

IPTABLES=/sbin/iptables
ROUTE=/sbin/route

WANIFACE="eth0"
LANIFACE="eth1"
VTRUNK="eth4"

VLAN109="vlan109"

$IPTABLES -F
$IPTABLES -F -t nat
$IPTABLES -X
$IPTABLES -P INPUT ACCEPT
$IPTABLES -F INPUT
$IPTABLES -P FORWARD DROP
$IPTABLES -F FORWARD
$IPTABLES -P OUTPUT ACCEPT
$IPTABLES -F OUTPUT

$IPTABLES -t nat -I POSTROUTING -o $WANIFACE -s 10.10.0.0/16 -j SNAT --to <external>

$IPTABLES -t nat -I POSTROUTING -o $WANIFACE -s 172.16.173.0/24 -j SNAT --to <external>
$IPTABLES -t nat -I POSTROUTING -o $WANIFACE -s 10.10.202.0/24 -j SNAT --to <external>

$IPTABLES -t nat -I POSTROUTING -o $WANIFACE -s 172.16.172.0/24 -j SNAT --to <external>
$IPTABLES -t nat -I POSTROUTING -o $WANIFACE -s 10.10.203.0/24 -j SNAT --to <external>

$IPTABLES -A FORWARD -p gre -j ACCEPT

$IPTABLES -A FORWARD -i vlan109 -o $LANIFACE -j ACCEPT
$IPTABLES -A FORWARD -i $LANIFACE -o vlan109 -j ACCEPT

$IPTABLES -A FORWARD -i vlan172 -o $WANIFACE -j ACCEPT
$IPTABLES -A FORWARD -i $WANIFACE -o vlan172 -j ACCEPT

$IPTABLES -A FORWARD -i vlan109 -o $WANIFACE -j ACCEPT
$IPTABLES -A FORWARD -i $WANIFACE -o vlan109 -j ACCEPT

$IPTABLES -A FORWARD -i vlan173 -o $WANIFACE -j ACCEPT
$IPTABLES -A FORWARD -i $WANIFACE -o vlan173 -j ACCEPT

---------------------------------------------------------------------------

From a node (10.10.109.109), I can ping other nodes on that same switch within the 10.10.109.x range.  I can also ping from 10.10.109.109 (test laptop) to the gateway (10.10.109.1) on eth4 where VL109 is bound.  This goes through two other switches to get to the Ubuntu box...so I know the switches have their tagging act together.  I can also ping 10.10.200.1 (still eth4) from 10.10.109.109.  The traffic will not leave the router though.

From 10.10.25.100, etc. I can ping pretty much any address on the class B subnet, and hit eth4 (10.10.109.1, 10.10.200.1) with no problem.   I cannot ping through from any 10.10.x.x address to 10.10.109.2-254.

SSH'd into the router, and I can ping all nodes on the 10.10.109.x VLAN.

I've zero'd out rp_filter for vlan109, then eth4, then eth0, and finally for all.  Tried in incrementally because this is a production environment that pretty much has no downtime, and I didn't want to break anything.

I've made all sorts of changes to the iptables script, reloaded, and still same behavior.

iptables -nvL shows eth4 and eth0 passing traffic, and eth0 and vlan172 / vlan173 throwing packets happily, but vlan109 and eth4 are no-go.  eth4 and eth1 are chattering away fine as well.

I'm still trying different things, but have noticed I'm starting to do some of the same things I've already tried.  When it gets circular, it's time to ask for help.

Is anything jumping out at anyone out there as a cause for the problem?
ASKER CERTIFIED SOLUTION
Avatar of noci
noci

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of fuats

ASKER

Thanks for the response.  I'll take a look at what tcpdump is showing.  I think last I looked at that it was stopping at the vlan109 gw.  The 109 needs to be on its own as it's apparently a very chatty system (PBX and associated vmail and controllers), needs isolation, but still needs to be remotely accesible for administration.  Using iptables, I am going to make it so the administrative machines can access it, but Joe User can't wander into it.

eth4 has all the VLANS, eth1 is the primary LAN (the class B monstrosity), and ultimately, I want to make it a class C with all the current subnets VLAN'd out - but it has to be done when it's not going to affect normal operations.  Changing the network range seems like a good alternative, because I can see some confusion arising from a VLAN that is actually in a range inclusive of the larger primary LAN.
Avatar of fuats

ASKER

After trying a million different iterations of my iptables script, I finally threw together a test vlan on a 192.168.109.x network.  Put a simple forwarding (eth1 <--> testVlan) statement in, and reran the script.  Ping ran right through, and outside systems connected.

The fact that the main network is class B (10.10.x.x/16) meant that it would not route data to a class C address (10.10.109.x/24) either for reasons of spoofing prevention, and/or it's on the same logical network.

I can't explain how happy I am to have this off my plate.  (Now to go play catch-up on all the stuff that piled in while I was fighting this problem.)  Thanks!