Urgent: HP Procurve Switch 2524 is locking up. All link lights are solid and it is not responding

Dear Experts,

Hi, I have a network and I use a 5 procurve switchs for all 3 floors. 2 switches are in the server room on the 3rd floor. From the 3rd floor and a link goes to the 5th floor switch and from there a 2 link come down to the 4th floor to 2 switches.
Mostly every morning all of my switches stops working, the cables are fine, everything looks fine in the switch but nobody can see nobody in the network but when I unplug the switch from the power supply or unplug the network cable (Link) and plug it again, everything gets normal, the problem is solved and the network becomes fine again. I dont have any idea of what could it be, maybe is not even the switch. I checked the logs and saw high collision error. We have two 10 base hub which we have configured the switch port from 100 to 10 Auto. We are changing the two hubs with gigabit switches.

I am not sure, where to begin to diagnose whats going.

Any help will be greatly appreciated.

Thank you,

mshaikh22
mshaikh22Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

stuknhawaiiCommented:
Where are the hubs connected to? Also how are the switches connected to gether? Fiber? What ports? I'm trying to get a better understanding of the current configuration. Aso are you running any VLAN's and trunking between switches? If you could post a drawing that would be very helpful!
0
mshaikh22Author Commented:
Thanks for responding stuknhawaii.

Basically this the layout


they are 5 hp procurve switches.

Short Version
Layout
*Link is a network cable coming from the patch panel.

3rd Floor Link 1 > Switch 1  >  5th Floor Link 3 > 5th Floor Switch 3 > 4th Floor Switch 4
3rd Floor Link 2 > Switch 2 > 5th Floor Link 4 > 5th Floor Switch 3 > 4th Floor Switch 5  


Detailed Version
3 ADSL Lines
3 Floors - 3th Floor -  (Switch 1 and Switch 2)
4th Floor 2 Switches  Switch 4 and Switch 5)
5th Floor 1 Switch (Switch 3)

Server Room is on the 3rd Floor

5th Floor Links 1 and 2 are connected to Switch 1 and Switch 2 in the server room.

Plus Switch 1
ADSL Line 1 Main Gateway (40.1) (IPSEC Site to Site VPN with our branch office) is connected to Switch 1- All Servers (Exchange Server, DC, Email Antivirus Gateway, SQL Server) - Also 10 workstation are connected that are on the 3rd Floor.

Branch Office (Has its own DC and does Site to Site VPN with ADSL Line 1)

ADSL Line 2 connected to Switch 2  (40.2)- (connected to Terminal Server, only port 3389 is open on that router for remote users to connect)  Workstations are also connected on Switch 2

(Mentioned Before) 5th Link 2 is also connected to Switch 2
 
On the 5th Floor Link 1 and Link 2 coming from the 3rd Floor are connected to Switch 3.
Link 3 and Link 4 from Switch 3 are going to the 4th Floor

2 10 Half Duplex are also connected to Switch 3.

All 5th PCs (20) are connected to Switch 3.
 
Link 3 coming from the 5th Floor is connected to Switch 4 which is on the 4th Floor.
ADSL Line 3 (40.3) is also connected to Switch 4.
ADSL line is used for another company that we have acquired, they mainly use http and rdp to connect to their servers remotely offsite. Their workstations are also connected.

Link 4 goes to Switch 5 which is on the 4th Floor and the PC are also connected (8 PCs)

Please let me know if I am being confusing

I am not sure where to being. I read the Switch event logs and found that the two hubs connected on Switch 3 on the 5th Floor are transmitting data packet error (High Collision Rate). I changed the ports the hubs are connected from 100 Auto to 10 Auto. The switches locked up again the following morning. After rebooting, they are fine.

Not sure, whats going on and dont know where to begin.

My director keeps asking me whats going since we just bought these switches a month ago.

Tried unplugging pc ports and hub port. Made no different, All link lights are still solid, when I unplug the Floor links, it starts working

 











0
stuknhawaiiCommented:
OK, what is plugged into the hubs? Users? Also it appears that Switch3 (5th floor) is actually the core of the network, meaning that all other switches connect into it, is this correct? The high collisions are expected on the hubs, that's why you should get rid of them ASAP! What version of procurves are these?
One thing we definately need to look at is, are there redundant links configured between the switches? Also make sure no cables are plugged from the switch back into the same switch (it happens). These last two things can be fixed with STP (spanning tree protocol) it shuts down redundant links until they're needed. If you not running STP it can cause bridging loops that will take down the network.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Cloud Class® Course: C++ 11 Fundamentals

This course will introduce you to C++ 11 and teach you about syntax fundamentals.

mshaikh22Author Commented:
Thank you for replying Stucknhawaii,

That correct, Switch 3 (5th Floor) is the core of the network. I am fixing to install two 16 port gigabit switches to replace the two hubs today. Model number of all are Procurve 2524.
Spanning tree is enabled on all Switches.
The sensitivity level is set to High. I will look at the switch today and give you more details.

Thanks again,

mshaikh
0
mshaikh22Author Commented:
Version is revision F.05.50, ROM F.02.01

0
mshaikh22Author Commented:
Hello Stucknhawai,

I installed gigabit switch to replace the 10 hubs and worked fine for 2 days and now the switches are locked up again.

I dont know what to do now.
0
jburgaardCommented:
If you are able to look at the web-interface when things go bad , you prob. will observe the traffic-bars will go up. The most interesting will be the yellow one, eventually indicating the root of your trouble or showing the way.

Wonder are link1 and link2 comming from same router?

0
stuknhawaiiCommented:
What is the yellow bar? Is it collisions? So now you have the 2 3rd floor switches connecting to the new gigabit 5th floor switch that connects to the two 4th floor switches? Is that how it is now? Do you see this increase in traffic on all switches or just one/two?

Important: I should have started with this... when did this start happening? Has always been this way? Think verrrry hard... if this has recently started happening, what changed? Something had to change to cause this problem to start. Did you add a new device such as another ADSL connection, new network printer, new wireless access point, user's moving around? Something has caused this. Did you update your switch IOS's? Please think back !!!!!
0
mshaikh22Author Commented:
Thank you for responding stucknhawaii and jburgaard:.

We use to have two 10 meg hubs connected  to the 5th floor switch.

 A wireless access point was connected to the 10 meg hub.
Which was used to connect to the Workstation in the Office.

We have replaced the two 10 meg hubs with two 16 port gigabit netgear switches and connected a netgear wireless access point to them.

I checked the logs and this time didnt see anything in the event log other than loss link events.

3rd Floor Switch 1 connects to 5th floor Link 1 patch                    (2 switches on the third floor)
3rd Floor Switch 2 connects to 5th Floor Link 2 patch                   (All servers are connected to switch 1)

5th Floor link 1 patch connects to 5th floor Switch 3                 ( 1 Switch on the 5th Floor)
5th floor link 2 patch connects to 5th floor switch 3                  (All 4 links are connected to one switch)            

(We use regular cat 5 cables not cross over to connect the links)

5th Floor Switch 3 connects to 4th floor link 1 patch
5th floor Switch 3 connects to 4th Floor link 2 patch

4th floor link 1 patchs connects to 4th floor switch 4                            
4th floor link 2 patchs connects to 4th floor switch 5


We been having this problem for a while even before we replaced all of the old auto switches to managed hp switches

The only big thing we have change is the switches and roles of the routers

Before

3 ADSL Lines

1 ADSL 40.1 was responsible for
1. Site to Site VPN with the branch office
2. Main Gateway
5. Internet

2nd Gateway
(40.2)
Emails
Terminal Services

3rd Gateway
Terminal services for the New Company to remote in to their offsite servers.


Now

3 ADSL Lines

1 ADSL 40.1 was responsible for
1. Site to Site VPN with the branch office
2. Main Gateway
3. Emails
4. Internet

2nd Gateway
(40.2)
Terminal Services for remote users only

3rd Gateway
Terminal services for the New Company to remote in to their offsite servers.

These are the changes, I haven't update the firmware on the switches and i knows it been very hard to narrow down whats going on.

I am open to all suggestions

Thank you,

Mansoor






0
stuknhawaiiCommented:
I found this on the HP website:
 if the 2848s were deployed as edge devices, potential mismatches in uplink port speed/duplex setting could result in brief port link loss events. To fully analyze the situation, we may need to gather more information such as switch "show tech all" data and network topology details. Please note, however, that the firmware running on these 2848 switches is relatively out of date. To eliminate the possible influence of known/resolved firmware issues, we suggest you try updating these switches to their current I.08.98 code, and then retesting their response percentage. The latest firmware can be downloaded from the ProCurve Web site.

Do the logs tell you which ports your receiving these "link loss" errors on? Also is it possible to update your code? Maybe if we can narrow this down to a few ports we can figure out why it's happening.
0
mshaikh22Author Commented:
Thanks a lot for this stuknhawaii.
I will update the firmware and see what happens. I cannot find the show tech all option on the 2524 switch.

One interesting thing happen, this morning when all the switch went down, I start unplugging all devices except for the servers and gateways. The switches were still locked up.


Then I started unpluging each server and plugging them back in.

Again no difference, link lights are still solid and locked up.

Tried unplugged the connection to the Line 2 ADSL Router (40.2). Still no difference
(Terminal Services only)

Then when unplugged the Line 1 ADSL Router (40.1) connection on the port, everything started working.
Which is very bizarre.
We have been having this problem, even before we got our switches upgraded to Hp procurve from auto sensing.

I know now that this port is causing problem but dont know why is it causing all the switches to fail.
(40.1) connected to Switch 1 (Emails,Site to Site VPN, Web)

But I am still going to upgrade the firmware on this.

Router is a Zyxel ADSL Router. This is a replacement router. Last router was also a zyxel but still use to lock up.  

I really appreciate all of your help stucknhawaii and advice you have been giving me.

The way I found out was I enabled remote control on the wan side. When the network locked up. I connected to the router and rebooted it, and everything started working fine.

Right now the graph shows two ports spiking up to 100 percent.

Logs dont register anything. Its on high sensitivity. Multi cast is on and spamming tree is on.

Should multi cast be turned on.

Port 4 and Port 8 bars are blue and they are spiking to 100% now. After 30 minutes, Port 3 and Port 8 were spiking to 100%.

I will check tomorrow which servers are connected to those port and let you know.

Any suggestion on how I can find out more about the problem with the ADSL Line
What you think might be wrong,

Thank you,
mshaikh22
mshaikh

   


0
stuknhawaiiCommented:
multi cast can definately cause problems, but I'm not sure why it's turned on, on your switch. I know that multicast sends alot of data out and could be causing a flooding effect, taking down your switches. I'd turn it off and see what happens.
For the ADSL,  I would try to hard code the speed/duplex on both the switch and the ADSL router to see if that helps.
0
mshaikh22Author Commented:
Thank you, Stuknhawaii. I will do that and let you know.
How can you hard code the speed between two switches
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Switches / Hubs

From novice to tech pro — start learning today.