Troubleshooting Broadcast Storm
Posted on 2004-10-18
We have a small network, 100+/- clients (almost all xp, 1-2 Win98) with W2K domain controller, couple W2K servers, couple Server 2003 servers, and a linux gateway supplied by our ISP.
The network is all a single subnet with 4 Netgear FSM750s switches. 2 of the switches in the rack with the servers, the other two in different areas of the building. The 2 remote switches are connected to the main server room switch through the GB ports, one fiber (gbic) one copper. (side note.... although we have never had trouble with the main switch till now we spent weeks trying to get the second gbic to work with the fiber run we have in place and were not able to get it going... this despite having the line tested for loss and going through multiple gbic adapters etc, etc hence the one copper backbone)
There are a minimal number of other switches or hubs, and just a single WAP we turn on occasionally to provide wireless connectivity to a conference room.
Everything worked fine untill recently when we added the 4th FSM750s into the rack in the server room. This was done to pick up some extra runs and accomadate new clients. (there was previously a small 'dumb' netgear switch that was connected to the main switch through a patch cable and this new switch replaced that one.)
We first tried to add the new switch to the main switch as a slave, through the rear stacking port. This resulted in us not being able to access the main switch from the web control panel and we were also not able to ping the switch on the network, althoug it appeared to be working normally. We connecet throught the console and the IP was the same as it had been previously (192.168.1.30, a static address in our set aside range for network equipment). Although we could manage the switch through the console, we could not see the slave switch in the console.
After trying different IPs and various other solutions, we disconnected the stacking cable and just connected the new switch to main switch as a temporaty work around. Everything seemed to be working again and we could now access the main switch again from the web control panel. (the two are connected with a patch cable)
After about 24-48 hours, we had a broadcast storm and had to reset the switch. The next day it happened again. We began to troubleshoot and covered all the obvious areas, eliminating what we could. Every day, about once a day, usually over night, the switch becomes unusable due to a broadcast storm.
Things we have tried:
resetting to factory defaults
removing the second switch
turning off port mirroring (our internet filtering software Ifilter)
double checking our anti virus software
looking for kids plugging patch cables into multiple jacks (it is a school)
looking for unauthorized machines
At this point I have begun to use some packet sniffing software to try and see what is actually going on but the problem seems very intermitent. I have noticed a lot of traffic on ports 137-138-139 inside the network but I don't have historical data to compare against to see if this is normal. Traffic does not seem to originate from a single source, I haven't even been able to find out which part of the network it is coming from.
In the past we never had problems with the network, it was always very solid (with the exception of the fiber which we never figured out)
We have recently taken down a norton anti virus server and switched to avg - norton hasn't been removed from all the clients but i can't believe this is the problem.
My questions are:
What is the likely hood that this is a bad switch?
Is there a virus that i may be missing?
what should i be looking for in captured packets?
what could cause a network storm on a regular but widely spaced interval?