Cisco 4506 switch rebooted itself

I have a Cisco 4506 chassis with (4) 48 port switch modules in it.  It is on a known good UPS, has redundant power supplies and everything.  About six weeks ago, the switch restarted itself for no known reason.  I couldn't find anything out of the ordinary ... it just came back online by the time I got to the switch room.

Today, It happened right at 3:00pm.  Reports that I got had some people losing power to the Cisco phones (PoE) and others claimed the phone didn't lose power but the display said ethernet connection lost.  The phones losing power were on switch module 3.

I went into the IOS and did a sh hardware and got this:
Cisco IOS Software, Catalyst 4500 L3 Switch  Software (cat4500e-IPBASEK9-M), Version 15.2(2)E5, RELEASE SOFTWARE (fc2)
Technical Support:
Copyright (c) 1986-2016 by Cisco Systems, Inc.
Compiled Thu 02-Jun-16 03:28 by prod_rel_team

ROM: 12.2(44r)SG5
ph-4506 uptime is 1 hour, 0 minutes
System returned to ROM by reload
System restarted at 14:58:59 CDT Wed Sep 20 2017
System image file is "bootflash:cat4500e-ipbasek9-mz.152-2.E5.bin"
Darkside Revision 4, Nexu Revision 9, Fortooine Revision 1.40

Last reload reason: reload

My question is, what else can I do from a troubleshooting standpoint?  Is it possible that just switch module 3 in the chassis lost power and the rest of the modules remained online?  I am having to accept end-user answers that some Cisco PoE phones lost power and some did not.  No one else has access to the switch to reload it so I can only assume it lost power for some reason and "reload" is just a generic reason.  Is there a different "Last reload reason" message if it just loses power?

Any pointers on figuring out what happened?
Steve BantzIT ManagerAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

SeanSystem EngineerCommented:
what does it show when you do a sh log?

also what does it say when you do a show version?
Steve BantzIT ManagerAuthor Commented:
Syslog logging: enabled (0 messages dropped, 7 messages rate-limited, 0 flushes, 0 overruns, xml disabled, filtering disabled)

No Active Message Discriminator.

I know we have it going to SolarWinds so I will look through there also.
Jane UpdegraffSr. Systems AdministratorCommented:
weird. According to Cisco's definition of the line "System returned to ROM by reload", the reload had to be initiated by a user so it thinks that a user initiated the reload. Does this switch have more than one supervisor module by chance? I used to have one that had two supervisors (one a warm spare) and they are both able to record logs showing remote command events. So if you can't see who (or what) initiated the reload command, and it doesn't appear on one supervisor you should look in the logs on the other, too.

Here are some reasons for crashes (and reloads):

I also seem to remember that a switch had a memory leak in one of the buffers on the primary supervisor. Although that may have been on a different core switch... at the time i had 4500s and 6500s and I can't remember which one it was that had the memory leak. It would just run out of memory and reload itself as a failsafe measure rather than locking up and crashing .. in that case a reload was preferable to a lockup.

Also look at these possible bugs (you'll need to be logged in using a cisco account) and see if any of them match your conditions:
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

Steve BantzIT ManagerAuthor Commented:
The response I got from Cisco is:
The supervisor engine's "Jawa" ASIC had detected a parity error and sent a signal to the central CPU forcing a reload. The component on the board where the parity error originated.  Parity errors indicate that one or more bits in a value of memory have inverted, from 0 to 1 or vice versa, causing a disparity between the expected and actual value. As a recovery mechanism, the system forces itself to reset.

There are two known causes for parity errors - hardware failure and transient disturbances. Environmental factors, such as electromagnetic interference, can alter the contents of memory cells, causing what is known as a "soft" parity error. This is uncontrollable, rare, and non-recurring. However, it's actually a more common phenomenon than sudden hardware failure - which causes what is often referred to as a "hard" parity error.

More information on these two types of parity errors can be sourced within the following document:

If this is the first time in recent past that the switch has crashed, I suggest we monitor it. If this was a true hardware fault, the system would inevitably attempt to access the corrupt memory cells, leading to another crash in the very near future.

However, if it continues to operate smoothly for a day or so, you should feel very confident that this was a transient issue that will not be seen again.
Jane UpdegraffSr. Systems AdministratorCommented:
Wow that is fascinating and a little annoying. A parity error of one single bit is going to instigate a spontaneous reload? And then report to you in the syslog that it was ordered by a person? That's really sloppy design for this ASIC. They could at least have made reference in their error to what might have happened and what should be done to test for further problems.

What is the environment like in the physical location? Got a cell phone tower casting a shadow over the data center or an electrical substation next door? Or did anyone sit a powerful magnet directly on top of the device? I would guess not but it can't hurt to consider it I suppose. And they're right. If it never happens again then something really weird happened, in which case it isn't likely to happen again. They just can't tell you what. LOL! Such a Cisco answer. But thanks for getting back to us.

Has anything happened again since your initial question? Any new reloads that would make you think you have a hardware problem?
Steve BantzIT ManagerAuthor Commented:
We do have construction work going on but nothing that should introduce any interference.  It has happened twice in 4 months.  No one but IT has access to the switch closet this equipment is in but I guess I can't rule out something environmental.  It is a nice climate controlled closet with UPS and circuits on generator.  Given remodeling and construction work, I suppose that is a possible source.

I have been checking the syslogs daily (set to debug) and nothing is happening.  I see normal things like Cisco Prime logging in to get backup configs and such.  No reloads ... knock on wood.
Jane UpdegraffSr. Systems AdministratorCommented:
Well that's a blessing, at least. And twice in four months is twice too often ... but at least if hasn't been MORE often. Do as TAC suggests and if it reloads again have them look at it really closely for hardware issues. They have better tools than we do.

Knocking on wood on your behalf :-)

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.