Network Wide Broadcast Storm with mostly Managed Switches

We have recently diagnosed a broadcast problem on our high speed internet network. It is unclear how long it has been going on for. However, it is bringing our network to an error ridden bog every time it happens.

Here is our setup:

1 HP Procurve central switch.
5 other managed switches connected various ways to the central procurve. Some with fiber and some with copper. (Dell, Netgear, Dlink)
There are about 5 more unmanaged switches connected to some of the 5 managed switches.

So in all there are around 11 or 12 switches on this network.  

Here is our problem:
I notced our activity lights constantly flashing on all switches. I hooked a network sniffer onto the network and noticed about 1500-2200 packets going through every second. This is the same no matter what switch I am plugged into.

The only way to stop the storm is to reset the central HP switch and at that point everything seems to go back to normal with somewhere between 15 and 200 packets per second.

There is not a single source for the broadcast storm many dinnferent types of MAC addresses have been identified as culprits. Each packet is identical.

The only way I can describe it is a device on the network sends our a broadcast and it starts to bounce around like a pinball machine and wont stop until it is unplugged.

This doesnt seem to happen with every broadcast just some, sometimes.



I have increased some on the broadcast storm settings on the switches to try and fix the problem. It seems to have helped some. But when the broadcast storm gets rolling it is just as bad as when the switches are set to the default settings.


If you have any questions please let me know......Any help would be greatly appreciated!!!

MohonkAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

pgm554Commented:
Could be spanning tree issue.But in any event look at this:

1 - Is there more than one frame type on the file servers, routers, print servers, etc.? If the answer is yes, then determine if every application and/or protocol on the network can run on one single frame type. Using a single frame type reduces redundant broadcast traffic.

2 - Is the network using many protocols, such as IPX, TCP/IP, LAT, SNA, NetBEUI, etc...? Is it possible to run all applications using a single protocol? If so, reconfigure the applications to run over this single protocol. Each protocol type requires it's own broadcasts, so minimizing the number of protocol families can lead to fewer broadcasts.

3 - External print servers and print server cards are known as 'plug-and-play' or 'ease of installation' devices. This simplicity comes with a price. Often, these devices are packaged with all of the major network protocols enabled, and sometimes multiple frame types are enabled. Most print servers have a management console or configuration screen to display what protocols and frame types are enabled. Disable all of the protocols and frame types not used on the network for printing.

4 - Most network switches default to enabling the spanning tree bridge protocol. Spanning tree is used for fault tolerance if redundant routes exist on the network. Unless your network is extra-mission-critical, it probably does not have redundant routes from every workstation. If possible, disable the spanning tree protocol. Spanning tree prevents loops on a network by sending out a 'hello' frame from each port every 2 seconds, which then gets resolved by every bridge or switch on the network. On a network with many switched nodes, a misconfiguration of the spanning tree protocols can create MANY broadcasts!

5- Make sure the WAN devices or routers have spoofing and/or filtering enabled. Contact your router manufacturer for specific functionality. The goal is to reduce the amount of broadcasts traversing the LAN and WAN, and to help conserve buffering memory inside the routers.

6- Have a network baseline analysis performed by an impartial 3rd party. A properly executed analysis will define protocols in use, identify problematic nodes, and give other pertinent information relating to the network's overall performance at all layers.
0
muhalokCommented:
Looks like a duplicate IP or a ambiguous LAN Path (STP is not enabled)

Steps to try in order to solve it:

1. Go and check your paths (links) between switches - when u have the loop on the main swtich, you have all the lights blinking madly!
So go and disconnect each cable 1 by 1, wait for each cable for a few moments - see that the lights stop blinking. In case they have - you found the end of the problematic link. Go with it forward and see where it ends - then check on that switch the same thing.

By the end of the procedure you will have a LOOP, which means that you connected 2 switches (2 ends ) with more than 1 cable (ambiguous path).

2. Check for different MAC addresses for the same IP address in arp table : go to several PCs and open a CMD then type "arp -a" in order to print the arp table, then look for same IP records.

Good luck.

PS: Source of the broadcast can help you (seen in sniffer) - find that MAC or IP and check the arp table there (arp -a).

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
cooleditCommented:
What kinda OS is on the clients ??..
Is it mainly using the DNS ?.
Is WIns installed.....

Are all clients using the Wins server .....

How many network connections:
12*24=300 users ?.. one large IP broadcast domain...
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

BeldoranCommented:
We had a similar problem and eventually (many many months) tracked it down to a faulty backplane on the switch.

Bel
0
MohonkAuthor Commented:
Really thats interesting Beldoran. However we had this problem and put this HP in to try and solve it. So it was happening even with another switch. Something HP tech support told us to try is set all ports to 100Full Duplex instead of auto. So were going to give that a shot.
0
MohonkAuthor Commented:
Thanks for all the help. I have narrowed down the problem to the following:

All broadcast storm packets originated either from the cisco 1100 WAPs or clients using these access points. As i removed these access points from the network the problem slowed down. When I removed all of the access points the problem subsided. I have yet to call Cisco to see if there is a fix. But the one time i plugged an AP back into the network it blew up again.

I will split the points between all who commented. Thanks

0
cooleditCommented:
hi, there

what you could try is to simply create a second VLAN for the Access Point's then they will be on there on broadcast doamin...
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Network Analysis

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.