Link to home
Start Free TrialLog in
Avatar of northfieldwifi
northfieldwifi

asked on

Random Poor Performing Network

Hello,

We are a WISP.  At one of our sites we are seeing random very poor performing of our routing.  It's not on the wireless sise, it's on the wired side.  We have a firewall, and between that firewall and the customer is a layer 2 switch, it's a Procurve 2520.  When this network issue happens routing becomes very sluggish, along with very high latencies and packet loss.

During the issue we are seeing nothing in the logs that would indicate an attack, DoS or likewise.  Our firewall CPU is not pegged, it's at it's normal level.  The switch, which is on our management vlan so no customers can launch an attack at it isn't running out of spec or anything like.  We have check out a possible mismatched duplex issue, which from what i am seeing we have none.  We limit multicast traffic at the customer premise, that is limited to 10 pps, then at our radio towers those are limited to 50pps, so total across the board that could hit our switching would be 300pps, which I would assume our core network should easily be able to handle.  

It's really kind of a weird issue.  It comes very quickly and leaves even quicker.  I was at this site when I saw in the switch logs Excessive Broadcasts hitting two of our ports, but unplugged those ports and the issue still seemed to occur.  Even now, the issue just happened I shut down those two ports in question and the issue still continued.  WE have since resolved this issue.  It was a CRC Alignment issue due to a faulty cable.

What am I overlooking here?  We thought possibly a backplane issue on a switch, so we swapped that out last week with a brand new switch.  The network will be fine all day long, then come lets say later in the afternoon to later in the night, it will appear.  When it does happen, it only lasts for 30 - 40 seconds, then is gone.

Thoughts on what we might be overlooking?  This feels like an attack of some kind, but there is nothing in the logs to indicate that is the actual issue.

Our Core router is an Allied Telesis AR770S, which is a Gigabyte firewall.  The switch that is between the customers and the firewall is a ProCurve 2520.  for our wireless gear we run Alvarion, VL 5.8GHz and we have WiMAX networks as well.  Since this issue is affecting our core site, it's affecting 2 other wireless sites that backhaul to our main site.

Thoughts?
Avatar of steveoskh
steveoskh

Since no one has responded.   We had wireless Internet at a number of our sites.  We had constant problems at just one site.  The WISP said that it was excessive data on our side.  Long story, short, the traffic they were seeing was just on their side of the wireless connection and not coming from us.  It was actually some of their wireless equipment talking to each other.  However they were monitoring the traffic, it appeared to come from us.
I point this out to suggest that your tools may not be looking at the correct side of the issue.
I would suggest putting a sniffer on the link side where you are having the problem.  You should be able to tell where the slowdown or bottle neck is.
Your Procurve switch will also give you information if you have it set to high sensitivity.  
Avatar of northfieldwifi

ASKER

We do monitor the wireless side, it's not coming from the client site.  We have multiple sites, when this happens, all sites are affected.  
Have you connected a packet sniffer to see exactly what is (or is not) going across the network when this happens?  We have found all kinds of things going on that we were not aware of or forgot about.   Some of the things that packet sniffing has uncovered for us over the years.
- Machines set to send SNMP trap information to machines that were not traps.
- Cash registers set to broadcast every item scanned to a coupon service that we did not use.  Vendor configured system wrong and was blaming performance issues on our network.
- Invalid DNS entries
- Software that was scanning the network every day.  Tech installed software to test and forgot about it.  
- User with Media Center addition activated some type of Internet TV that downloads shows in the background.  User never used the program and forgot he had it, but it was still downloading 3 gigs of shows a day.

Obviously most of these do not apply to your situation, however knowing exactly what packets are going across the network does.
I think what we have narrowed down is there is a switch in our core that all ports on the switch is part of all VLAN's, causing all broadcast information to be sent to all ports on the switch possibly causing the switch to get stormed during peak traffic times.  We are going to be re-working this switch to have all ports be apart of their own VLAN to see if this resolves the mini hickups.

We came to this conclusing while sniffing packets and getting all broadcasts from all vlans.  Thoughts?
I think you are the right track.   You may have found the cause.  The next question I would ask is why the excessive broadcasts?  Normal traffic or is a switch/router address table getting corrupted/reset/full causing the broadcasts.
The 2520 mac address table holds 8,000 records.
I think this is stemming from normal traffic.  We are not sure this is the reason for the switch to act the way it has been acting, but right now it's our best guess.  I doubt the switch is hitting it's 8,000 records, what I am guessing is happing the multicast traffic during peak network usage is flooding the router since all ports are a part of all VLANs.  

Does this logic make sense?
When I say flooding, I am guessing this is more CPU/mem related more than anything....
ASKER CERTIFIED SOLUTION
Avatar of steveoskh
steveoskh

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial