Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

Cisco 4506 switch rebooted itself

Posted on 2017-09-20
7
Medium Priority
?
37 Views
Last Modified: 2017-10-13
I have a Cisco 4506 chassis with (4) 48 port switch modules in it.  It is on a known good UPS, has redundant power supplies and everything.  About six weeks ago, the switch restarted itself for no known reason.  I couldn't find anything out of the ordinary ... it just came back online by the time I got to the switch room.

Today, It happened right at 3:00pm.  Reports that I got had some people losing power to the Cisco phones (PoE) and others claimed the phone didn't lose power but the display said ethernet connection lost.  The phones losing power were on switch module 3.

I went into the IOS and did a sh hardware and got this:
Cisco IOS Software, Catalyst 4500 L3 Switch  Software (cat4500e-IPBASEK9-M), Version 15.2(2)E5, RELEASE SOFTWARE (fc2)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2016 by Cisco Systems, Inc.
Compiled Thu 02-Jun-16 03:28 by prod_rel_team

ROM: 12.2(44r)SG5
ph-4506 uptime is 1 hour, 0 minutes
System returned to ROM by reload
System restarted at 14:58:59 CDT Wed Sep 20 2017
System image file is "bootflash:cat4500e-ipbasek9-mz.152-2.E5.bin"
Darkside Revision 4, Nexu Revision 9, Fortooine Revision 1.40

Last reload reason: reload

My question is, what else can I do from a troubleshooting standpoint?  Is it possible that just switch module 3 in the chassis lost power and the rest of the modules remained online?  I am having to accept end-user answers that some Cisco PoE phones lost power and some did not.  No one else has access to the switch to reload it so I can only assume it lost power for some reason and "reload" is just a generic reason.  Is there a different "Last reload reason" message if it just loses power?

Any pointers on figuring out what happened?
0
Comment
Question by:Steve Bantz
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
7 Comments
 
LVL 11

Expert Comment

by:Sean
ID: 42302356
what does it show when you do a sh log?

also what does it say when you do a show version?
0
 
LVL 1

Author Comment

by:Steve Bantz
ID: 42302402
Syslog logging: enabled (0 messages dropped, 7 messages rate-limited, 0 flushes, 0 overruns, xml disabled, filtering disabled)

No Active Message Discriminator.

I know we have it going to SolarWinds so I will look through there also.
0
 
LVL 5

Expert Comment

by:Jane Updegraff
ID: 42303449
weird. According to Cisco's definition of the line "System returned to ROM by reload", the reload had to be initiated by a user so it thinks that a user initiated the reload. Does this switch have more than one supervisor module by chance? I used to have one that had two supervisors (one a warm spare) and they are both able to record logs showing remote command events. So if you can't see who (or what) initiated the reload command, and it doesn't appear on one supervisor you should look in the logs on the other, too.

Here are some reasons for crashes (and reloads):

https://www.cisco.com/c/en/us/support/docs/ios-nx-os-software/ios-software-releases-121-mainline/7957-crashes-lesscommon.html#anc19

I also seem to remember that a switch had a memory leak in one of the buffers on the primary supervisor. Although that may have been on a different core switch... at the time i had 4500s and 6500s and I can't remember which one it was that had the memory leak. It would just run out of memory and reload itself as a failsafe measure rather than locking up and crashing .. in that case a reload was preferable to a lockup.

Also look at these possible bugs (you'll need to be logged in using a cisco account) and see if any of them match your conditions:

https://bst.cloudapps.cisco.com/bugsearch/bug/CSCsi17158/?
https://bst.cloudapps.cisco.com/bugsearch/bug/CSCuh49736/?
https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvd05307/?
https://bst.cloudapps.cisco.com/bugsearch/bug/CSCuu34535/?
1
Reclaim your office - Try the MB 660 headset now!

High level of background noise often makes it difficult for employees to concentrate fully on their jobs – or to communicate clearly on calls. The MB 660 headset helps you create a disruption free workspace.  

 
LVL 1

Author Comment

by:Steve Bantz
ID: 42324463
The response I got from Cisco is:
The supervisor engine's "Jawa" ASIC had detected a parity error and sent a signal to the central CPU forcing a reload. The component on the board where the parity error originated.  Parity errors indicate that one or more bits in a value of memory have inverted, from 0 to 1 or vice versa, causing a disparity between the expected and actual value. As a recovery mechanism, the system forces itself to reset.

There are two known causes for parity errors - hardware failure and transient disturbances. Environmental factors, such as electromagnetic interference, can alter the contents of memory cells, causing what is known as a "soft" parity error. This is uncontrollable, rare, and non-recurring. However, it's actually a more common phenomenon than sudden hardware failure - which causes what is often referred to as a "hard" parity error.

More information on these two types of parity errors can be sourced within the following document:

http://www.cisco.com/en/US/products/hw/routers/ps341/products_tech_note09186a0080094793.shtml#softvshard

If this is the first time in recent past that the switch has crashed, I suggest we monitor it. If this was a true hardware fault, the system would inevitably attempt to access the corrupt memory cells, leading to another crash in the very near future.

However, if it continues to operate smoothly for a day or so, you should feel very confident that this was a transient issue that will not be seen again.
0
 
LVL 5

Expert Comment

by:Jane Updegraff
ID: 42324529
Wow that is fascinating and a little annoying. A parity error of one single bit is going to instigate a spontaneous reload? And then report to you in the syslog that it was ordered by a person? That's really sloppy design for this ASIC. They could at least have made reference in their error to what might have happened and what should be done to test for further problems.

What is the environment like in the physical location? Got a cell phone tower casting a shadow over the data center or an electrical substation next door? Or did anyone sit a powerful magnet directly on top of the device? I would guess not but it can't hurt to consider it I suppose. And they're right. If it never happens again then something really weird happened, in which case it isn't likely to happen again. They just can't tell you what. LOL! Such a Cisco answer. But thanks for getting back to us.

Has anything happened again since your initial question? Any new reloads that would make you think you have a hardware problem?
0
 
LVL 1

Author Comment

by:Steve Bantz
ID: 42325641
We do have construction work going on but nothing that should introduce any interference.  It has happened twice in 4 months.  No one but IT has access to the switch closet this equipment is in but I guess I can't rule out something environmental.  It is a nice climate controlled closet with UPS and circuits on generator.  Given remodeling and construction work, I suppose that is a possible source.

I have been checking the syslogs daily (set to debug) and nothing is happening.  I see normal things like Cisco Prime logging in to get backup configs and such.  No reloads ... knock on wood.
0
 
LVL 5

Accepted Solution

by:
Jane Updegraff earned 2000 total points
ID: 42325906
Well that's a blessing, at least. And twice in four months is twice too often ... but at least if hasn't been MORE often. Do as TAC suggests and if it reloads again have them look at it really closely for hardware issues. They have better tools than we do.

Knocking on wood on your behalf :-)
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Join & Write a Comment

There’s a movement in Information Technology (IT), and while it’s hard to define, it is gaining momentum. Some call it “stream-lined IT;” others call it “thin-model IT.”
Let’s face it: one of the reasons your organization chose a SaaS solution (whether Microsoft Dynamics 365, Netsuite or SAP) is that it is subscription-based. The upkeep is done. Or so you think.
Both in life and business – not all partnerships are created equal. As the demand for cloud services increases, so do the number of self-proclaimed cloud partners. Asking the right questions up front in the partnership, will enable both parties …
Both in life and business – not all partnerships are created equal. Spend 30 short minutes with us to learn:   • Key questions to ask when considering a partnership to accelerate your business into the cloud • Pitfalls and mistakes other partners…

636 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question