We are a WISP. At one of our sites we are seeing random very poor performing of our routing. It's not on the wireless sise, it's on the wired side. We have a firewall, and between that firewall and the customer is a layer 2 switch, it's a Procurve 2520. When this network issue happens routing becomes very sluggish, along with very high latencies and packet loss.
During the issue we are seeing nothing in the logs that would indicate an attack, DoS or likewise. Our firewall CPU is not pegged, it's at it's normal level. The switch, which is on our management vlan so no customers can launch an attack at it isn't running out of spec or anything like. We have check out a possible mismatched duplex issue, which from what i am seeing we have none. We limit multicast traffic at the customer premise, that is limited to 10 pps, then at our radio towers those are limited to 50pps, so total across the board that could hit our switching would be 300pps, which I would assume our core network should easily be able to handle.
It's really kind of a weird issue. It comes very quickly and leaves even quicker. I was at this site when I saw in the switch logs Excessive Broadcasts hitting two of our ports, but unplugged those ports and the issue still seemed to occur. Even now, the issue just happened I shut down those two ports in question and the issue still continued. WE have since resolved this issue. It was a CRC Alignment issue due to a faulty cable.
What am I overlooking here? We thought possibly a backplane issue on a switch, so we swapped that out last week with a brand new switch. The network will be fine all day long, then come lets say later in the afternoon to later in the night, it will appear. When it does happen, it only lasts for 30 - 40 seconds, then is gone.
Thoughts on what we might be overlooking? This feels like an attack of some kind, but there is nothing in the logs to indicate that is the actual issue.
Our Core router is an Allied Telesis AR770S, which is a Gigabyte firewall. The switch that is between the customers and the firewall is a ProCurve 2520. for our wireless gear we run Alvarion, VL 5.8GHz and we have WiMAX networks as well. Since this issue is affecting our core site, it's affecting 2 other wireless sites that backhaul to our main site.