Random Poor Performing Network


We are a WISP.  At one of our sites we are seeing random very poor performing of our routing.  It's not on the wireless sise, it's on the wired side.  We have a firewall, and between that firewall and the customer is a layer 2 switch, it's a Procurve 2520.  When this network issue happens routing becomes very sluggish, along with very high latencies and packet loss.

During the issue we are seeing nothing in the logs that would indicate an attack, DoS or likewise.  Our firewall CPU is not pegged, it's at it's normal level.  The switch, which is on our management vlan so no customers can launch an attack at it isn't running out of spec or anything like.  We have check out a possible mismatched duplex issue, which from what i am seeing we have none.  We limit multicast traffic at the customer premise, that is limited to 10 pps, then at our radio towers those are limited to 50pps, so total across the board that could hit our switching would be 300pps, which I would assume our core network should easily be able to handle.  

It's really kind of a weird issue.  It comes very quickly and leaves even quicker.  I was at this site when I saw in the switch logs Excessive Broadcasts hitting two of our ports, but unplugged those ports and the issue still seemed to occur.  Even now, the issue just happened I shut down those two ports in question and the issue still continued.  WE have since resolved this issue.  It was a CRC Alignment issue due to a faulty cable.

What am I overlooking here?  We thought possibly a backplane issue on a switch, so we swapped that out last week with a brand new switch.  The network will be fine all day long, then come lets say later in the afternoon to later in the night, it will appear.  When it does happen, it only lasts for 30 - 40 seconds, then is gone.

Thoughts on what we might be overlooking?  This feels like an attack of some kind, but there is nothing in the logs to indicate that is the actual issue.

Our Core router is an Allied Telesis AR770S, which is a Gigabyte firewall.  The switch that is between the customers and the firewall is a ProCurve 2520.  for our wireless gear we run Alvarion, VL 5.8GHz and we have WiMAX networks as well.  Since this issue is affecting our core site, it's affecting 2 other wireless sites that backhaul to our main site.

Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Since no one has responded.   We had wireless Internet at a number of our sites.  We had constant problems at just one site.  The WISP said that it was excessive data on our side.  Long story, short, the traffic they were seeing was just on their side of the wireless connection and not coming from us.  It was actually some of their wireless equipment talking to each other.  However they were monitoring the traffic, it appeared to come from us.
I point this out to suggest that your tools may not be looking at the correct side of the issue.
I would suggest putting a sniffer on the link side where you are having the problem.  You should be able to tell where the slowdown or bottle neck is.
Your Procurve switch will also give you information if you have it set to high sensitivity.  
northfieldwifiAuthor Commented:
We do monitor the wireless side, it's not coming from the client site.  We have multiple sites, when this happens, all sites are affected.  
Have you connected a packet sniffer to see exactly what is (or is not) going across the network when this happens?  We have found all kinds of things going on that we were not aware of or forgot about.   Some of the things that packet sniffing has uncovered for us over the years.
- Machines set to send SNMP trap information to machines that were not traps.
- Cash registers set to broadcast every item scanned to a coupon service that we did not use.  Vendor configured system wrong and was blaming performance issues on our network.
- Invalid DNS entries
- Software that was scanning the network every day.  Tech installed software to test and forgot about it.  
- User with Media Center addition activated some type of Internet TV that downloads shows in the background.  User never used the program and forgot he had it, but it was still downloading 3 gigs of shows a day.

Obviously most of these do not apply to your situation, however knowing exactly what packets are going across the network does.
The Ultimate Tool Kit for Technolgy Solution Provi

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy for valuable how-to assets including sample agreements, checklists, flowcharts, and more!

northfieldwifiAuthor Commented:
I think what we have narrowed down is there is a switch in our core that all ports on the switch is part of all VLAN's, causing all broadcast information to be sent to all ports on the switch possibly causing the switch to get stormed during peak traffic times.  We are going to be re-working this switch to have all ports be apart of their own VLAN to see if this resolves the mini hickups.

We came to this conclusing while sniffing packets and getting all broadcasts from all vlans.  Thoughts?
I think you are the right track.   You may have found the cause.  The next question I would ask is why the excessive broadcasts?  Normal traffic or is a switch/router address table getting corrupted/reset/full causing the broadcasts.
The 2520 mac address table holds 8,000 records.
northfieldwifiAuthor Commented:
I think this is stemming from normal traffic.  We are not sure this is the reason for the switch to act the way it has been acting, but right now it's our best guess.  I doubt the switch is hitting it's 8,000 records, what I am guessing is happing the multicast traffic during peak network usage is flooding the router since all ports are a part of all VLANs.  

Does this logic make sense?
northfieldwifiAuthor Commented:
When I say flooding, I am guessing this is more CPU/mem related more than anything....
Yes your logic is sound.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.