Link to home
Start Free TrialLog in
Avatar of Creden Wigal
Creden Wigal

asked on

Cross-Network Traffic

For starters, I am new to this organization and inherited a mess of a network.  We have a WatchGuard Firebox M500 for routing/firewall and we have seven main switches.  The organization started out with a /24 network and outgrew the available IP addresses  a while back so they created a new subnet and both are still in use today.  We are slowing phasing out the /24 network...but it will take a while.

Four switches are on the original network (network 1) and three switches on network 2.  They are connected to the pertinent port on the WatchGuard to route traffic between them.  VLAN's are not used, just the different subnets.  I have looked at the logs on the WatchGuard and I can see it passing traffic between the two networks.

I found a cable connecting one switch on network 1 to a switch on network 2...and these are on different subnets.  I disconnected that cable and users instantly lost connection with their servers (servers are almost all on network 1 and users PC's/Mac's are on network 2).  Equipment is all setup with correct gateway, dns, etc.  

This is a 24/7 business...so unfortunately I don't have very long to take the connection down and do any testing...and I have looked at everything I could think of as to why they lose connection when I disconnect this cable and come up with nothing.

I was hoping someone might be able to help me figure out how I can disconnect that cable and have the network stable going through the WatchGuard, like it should be.

Let me know what you need to see configuration-wise and I'll do my best to give as much info as you need.

Thank you!
Avatar of Steve Knight
Steve Knight
Flag of United Kingdom of Great Britain and Northern Ireland image

Would have been a lot simpler if they'd just changed the subnet mask on the /24 network or created VLAN's than a routed subnet - are you sure there are no VLAN defined on those switches?

If you stick a laptop or other device on both ends of that cable do you get a DHCP address from the relevant subnet?

How are they accessing this server they lose touch with, and do they lose anything else - can they ping the router still, what do they have as their gateway, is it the router etc?

So I'd start with ipconfig /all, route print on a PC on both subnets, look at what each end of that cable gives you and look at config. on the router.

Does a server do your dhcp and dns or the firewall device?

Steve
Do you have 2 cables coming from the Watchguard and going to the same switch? It sounds like VLANs might be implemented on the switches (even if only a subset), not necessarily on the Watchguard itself.
Both subnets exist on VLAN 1, the default VLAN.  If these are dumb (un-managed) switches, then they will not have any VLAN.

VLANs are logical ways to segregate broadcast domains without using separate physical switches.

When you patch together two dumb switches, or two networks with the same VLAN, then all ports exist in the same broadcast domain.

You can place multiple IP subnets in the same broadcast domain.  It's not wise, but people do it.  The two subnets don't route, because ARP and MAC address tables do the work.

If you are 1.1.1.1 and you need 2.2.2.2, you will broadcast "where is 2.2.2.2?" into your broadcast domain.  Only if no answer comes, then you will follow the next hop according to your OS's routing table.  But, your switches see the direct link, and answer "here is 2.2.2.2!  Just go to PortXx."


You might have a specific route that says 2.2.2.2/32 or 2.2.2.0/24 or 2.2.0.0/16 hops to 1.1.1.254 (a router).

You might have a default route that says 0.0.0.0 hops to 1.1.1.254 (a router).

But, this riute will not be followed so long as the Layer 2 switch thinks it has andirect connection to 2.2.2.2.

TEST:
1. From one workstation 1.1.1.1, enter a static route for 2.2.2.2/24 to Router1 (1.1.1.254).
2. Disconnect patch cable between switches.
3. Reboot switches and workstation.

This is a known way to clear caches.  If you have managed switches, you can clear them much faster with a console command.  Rebooting is the second slowest method slowest method.  The slowest method is waiting 3-6 hours for the TTL values to expire.

The switches stop "knowing" the path to 2.2.2.2, because the patch cable  is gone, and the cache of the links are gone.

Your broadcast "where is 2.2.2.2?" Will not get an answer from its switch.

Then, your OS will look for a route.  The next broadcast would be "where is 1.1.1.254?"

When you find the router, you will send the request for 2.2.2.2 to Port1 of the router.  The router will search into its routing tables for 2.2.2.2/32.  The result should say "send traffic to Port2".

But, if you plug that patch cable into both physical networks again, you will create the Layer2 link.  It willntake over, and your equipment will start ignoring the Layer3 route again.

 You need to publish a maintemance window.  Publish the start & stop times, giving yourself time dor recovery (and/or reboot).

I've been in 24/7 manufacturing shops and now work in 24/7 hospitals.  Nothingnstays running 100% of the time.  You need to fix what's broken in a planned manner, or it may become catastrophic later, in an unplanned manner.
Fastest way:
Talk to an msp or vendor and get at least some lower end managed switches and shadow their guy doing setup to learn. I'd suggest at least a stack of Cisco sg500 but I'd need to know more about your users/servers/traffic to give a better recommendation.
I agree with many of the suggestions above but feel some may be a little complicated at this point, considering your short testing windows (guessing you'd get shot if you knock the systems off for too long)

I'd start simple for now:
How have you established no VLANs are in use? have you checked the config on all switches?
Although you have advised the firewall has a port connected to each switch, have you confirmed if these ports are both taking traffic? Could it all be flowing via one port (could confirm in logs or by port usage stats)
Are the two switches with the 'unidentified' connection managed or unmanaged (dumb)? if they have a port mirror facility you could wireshark the link to see exactly what is flowing over it.
For our information, could you provide the subnets/masks and gateway IPs please?
Avatar of Creden Wigal
Creden Wigal

ASKER

Thank you all for the answers.  

Sorry, I should have specified that the switches are Cisco Catalyst 3560G's.  I have my Network+ cert and I am close to having my CCNA completed so I am far from an expert...but I know at least a little about the problems.  


Switch 1, 2, 4, and 6 are configured for subnet 1.   Switch 3, 5, 7 are configured for subnet 2.  Switch 4 has a connection to the WatchGuard for subnet 1.  Switch 5 has the connection to WatchGuard for subnet 2.   I can see the WatchGuard passing traffic between the subnets and tracert's from each network to the other show the WatchGuard as the first hop between the two networks.

We have a domain controller on network 1 and one on network 2.  Each of them handle DHCP  for their respective networks...although I have seen equipment connected to network 1 switches pull an IP for network 2 when the /24 DHCP scope is full...yet another thing that shouldn't be happening.  

Network 2 switches are setup with their ports as VLAN 2, so technically they are configured for VLAN's...but they are connected through two different interfaces on the WatchGuard.

Switch 6 and 7 are connected via a patch cable and this is the one that I disconnected and everyone lost connection to the servers they use.  Due to it being in the middle of the production day, I plugged it back in immediately and they were fine...I'll have to wait till this weekend to do some troubleshooting when I can actually unplug it and affect the least amount of users.

Switch 6 - Network 1 - Excerpt of Configuration:

...
interface GigabitEthernet0/7
!
interface GigabitEthernet0/8 (Connected to switch 7 port 19)
!
interface GigabitEthernet0/9
...
interface Vlan1
 ip address 192.168.0.205 255.255.255.0
 no ip route-cache
!
ip default-gateway 192.168.0.43
ip classless



Switch 7 - Network 2 - excerpt of configuration :

...
interface GigabitEthernet0/18
 switchport access vlan 2
!
interface GigabitEthernet0/19 (Connected to switch 6 port 8)
 switchport access vlan 2
!
interface GigabitEthernet0/20
 switchport access vlan 2

...

interface Vlan1
 no ip address
!
interface Vlan2
 ip address 192.168.32.3 255.255.224.0
!
ip default-gateway 192.168.32.1
ip classless
...

Most of our servers reside on the first network (192.168.0.1/24).  Most PC's reside on the second network (192.168.32.1/19).


We have huge plans ahead of running our network through the Switch SuperNAP and a complete overhaul of the network including phasing out network 1...but moving servers that both our users, customers, and vendors use is not something we can do quickly.  I've already fixed quite a few problems that was causing broadcast storms and other problems.  Users already feel the network is much faster than it used to be.  

I am going to see if I can troubleshoot some over the weekend with the cable unplugged and from the PC's that are experiencing problems connecting when it unplugged.  May have more info on Monday.

Thanks again all!!
Sorry, thought I read you had unmanaged switches, ignore me and my incorrect comments ;)
Aaron, my fault.  I originally didn't specify if they were managed or unmanaged.  Thanks for trying, though!
Steve,

They didn't want to just change the subnet mask before because they had their network setup as 192.168.0.1/24 which causes problems when people try to VPN from home and their home network is 192.168.0.1/24 (which a lot of home routers have as a default network).
ASKER CERTIFIED SOLUTION
Avatar of Steve Knight
Steve Knight
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I see what you are saying, Steve.  Very good point.  I have already setup to come in and do some troubleshooting at 4AM Monday morning when it will affect the least amount of users...but I think you might have hit the nail on the head.  Because there are more switches on network 1...many users are still connected to those switches and have probably pulled an IP from the second network through this rogue cable...and as soon as I pull that cable, they can't communicate anymore.  I'll let you know Monday if that is the case and mark the solution if that is it.  Thanks so much!!
No problem, good luck with it!
Steve
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Subnets don't stop network traffic.
VLANs would, but only if they are defined in the same way over trunks.

If you have VLAN1 port plugged into VLAN2 port, they are both in the same broadcast domain or network segment.  It's called the "native" VLAN, or "untagged" VLAN.

That's why tour DHCP server is handing out addresses to the wrong computers.  They're asking any DHCP that can hear the broadcast to give them an IP address.
I was actually already working on a way to stop DHCP on the old network.  I have identified what has reservations and ran an IP scan to identify everything on the 0 network right now. I am just keeping printers and servers on the old network and giving them all static IP's.  There will probable be some growing pains on Monday...but I think in the end the two networks will be much more stable with their communication.  I can't believe this network even communicated at all before I arrived.  I found switches plugged into themselves and multiple cross-network connections...and I think this is the last one and that's why it's being a pain.  

Overall, some great comments and advice.  Thanks to everyone that has submitted an answer.  Once I get this fixed, I'll be marking best solution, etc.  I truly appreciate it!!
From what you've advised I'm not convinced the traffic is flowing correctly, as the fact you are picking up DHCP from both networks suggests the router isn't actually routing the traffic correctly (DHCP broadcast traffic isn't normally routed, unless you have specifically configured it to)
This means that both switches/VLANs can see both VLANs freely without the router being involved. they will pick up one of the two subnets depending on which DHCP they pick up, and will route traffic to the appropriate firewall/router interface via the same cable regardless. (many firewalls can allow cross interface traffic so it may not actually matter which physical interface you hit)

I recommend checking your logs/stats again to confirm that traffic is reaching the Router interface for each subnet/vlan by the right switch connections. I suspect it isn't, and all traffic is reaching the firewall on the same port via ONE of the switches.

This would explain a lot of the behaviour you are experiencing.

Potential reasons:
STP - your switches could have identified there are multiple routes to both networks (via links to the firewall/router AND via the link between switches.)
This switch link probably has a faster response time that the firewall/router one so may have been deemed the preferred route.
Incorrect VLAN/port config - one of the links to the firewalls may be incorrect, so that route simply doesn't allow traffic to flow and everything uses the link between switches instead.
Did you get anywhere with this, was it just effectively one big network running two dhcp servers issuing two IP ranges randomly to clients with them communicating directly if they picked the same IP range or via the router to the "other" IP range devices?
Sorry...last weekend ended up not being a good weekend to do it.   Plan is now to pull the cable this Sunday and deal with any static IP's that I missed.  I still think that it was just the equipment with IP for one network  that was plugged into a switch for the other network wasn't able to reach their particular gateway last time I pulled the cable.   I have DHCP set for one hour for each of the networks...so after I pull the cable, everything should have the correct IP and be able to communicate within the hour.  Again...thanks for the help everyone.  More info incoming this Monday after I pull the cable.
Lets see if the author comes back on this otherwise will post suggested answers...
Thanks, Steve...your email reminded me that I still need to come back to this.   So it was just an IP address problem once the cable was removed.  Once we renewed the IP address on PC's that disconnected once the cable between subnets was removed, they were able to function normally as they picked up the IP address from the correct network.  DHCP had 99% of it fixed after a few minutes...but we still had to run around and fix any static IP's we missed, etc.  Your help was invaluable Steve.  One of your answers was spot on.  Thank you, everyone!!
Thanks.  Looks like you made the question with a different account?  You just need to log back in with that and choose answers.

Syeve
For some reason it wasn't logging me in correctly...then suddenly it worked.  Thanks again, Steve!!