Solved

rehat server crash question

Posted on 2010-08-25
6
613 Views
Last Modified: 2013-11-25
Our server has been crashed and here is the log message from the message file.

kernel: pciehp: acpi_pciehprm:\_SB_.PCI0.EXB0 _HPP fail=0x5


Do  you guys know what is causing it?
0
Comment
Question by:mokkan
6 Comments
 

Author Comment

by:mokkan
ID: 33525355
Also, server hung and we have rebooted it.
0
 
LVL 9

Accepted Solution

by:
jeremycrussell earned 167 total points
ID: 33525487
That's referring to PCI hot plugging, was there possibly some type of device being inserted/removed?

I guess its possible that its also related to ACPI functions of some sort.

If this is happening often, you may be able to disable/tweak ACPI or PCIe in your bios as a workaround.

0
 
LVL 4

Assisted Solution

by:abodette
abodette earned 167 total points
ID: 33525490
Pretty sure you have a bad PCI Express card.

pcihp is the PCI Express HotPlug

and it's giving you a failure code, I'm not sure for what device, but you can't have too many PCI express cards and there are likely related symptoms.

Did you happen to patch the OS recently? there are a few known issues with hotplug that may be remedied by patching to a newer kernel version.

Also look around that error in the message file to see if anything is coming up along with it.
0
Master Your Team's Linux and Cloud Stack

Come see why top tech companies like Mailchimp and Media Temple use Linux Academy to build their employee training programs.

 

Author Comment

by:mokkan
ID: 33525736
Thank you  for the help. Here is message I got it from the message file.

Aug 25 10:16:45 nepeon kernel: pciehp: acpi_pciehprm:\_SB_.PCI0.EXB4 _HPP fail=0x5
Aug 25 10:16:45 nepeon kernel: pciehp: acpi_pciehprm:\_SB_.PCI0.EXB4 OSHP fails=0x5
Aug 25 10:16:45 nepeon kernel: pciehp: acpi_pciehprm:\_SB_.PCI0.EXB4 _HPP fail=0x5
Aug 25 10:16:45 nepeon kernel: pciehp: acpi_pciehprm:\_SB_.PCI0.EXB4 OSHP fails=0x5
Aug 25 10:16:45 nepeon kernel: pciehp: acpi_pciehprm:\_SB_.PCI0.EXB4 _HPP fail=0x5
Aug 25 10:16:45 nepeon kernel: pciehp: acpi_pciehprm:\_SB_.PCI0.EXB4 OSHP fails=0x5
Aug 25 10:16:45 nepeon kernel: pciehp: acpi_pciehprm:\_SB_.PCI0.EXB4 _HPP fail=0x5
Aug 25 10:16:45 nepeon kernel: pciehp: acpi_pciehprm:\_SB_.PCI0.EXB4 OSHP fails=0x5
Aug 25 10:16:45 nepeon kernel: pciehp: acpi_pciehprm:\_SB_.PCI0.EXB4 _HPP fail=0x5
Aug 25 10:16:45 nepeon kernel: pciehp: acpi_pciehprm:\_SB_.PCI0.EXB4 OSHP fails=0x5
Aug 25 10:16:45 nepeon kernel: pciehp: acpi_pciehprm:\_SB_.PCI0.EXB4 _HPP fail=0x5


lspci   |  grep   -i   express
00:13.0 PCI bridge: Broadcom HT2100 PCI-Express Bridge (rev a2)
00:14.0 PCI bridge: Broadcom HT2100 PCI-Express Bridge (rev a2)
00:15.0 PCI bridge: Broadcom HT2100 PCI-Express Bridge (rev a2)
00:16.0 PCI bridge: Broadcom HT2100 PCI-Express Bridge (rev a2)
00:17.0 PCI bridge: Broadcom HT2100 PCI-Express Bridge (rev a2)
40:13.0 PCI bridge: Broadcom HT2100 PCI-Express Bridge (rev a2)
40:14.0 PCI bridge: Broadcom HT2100 PCI-Express Bridge (rev a2)
40:15.0 PCI bridge: Broadcom HT2100 PCI-Express Bridge (rev a2)
40:16.0 PCI bridge: Broadcom HT2100 PCI-Express Bridge (rev a2)
40:17.0 PCI bridge: Broadcom HT2100 PCI-Express Bridge (rev a2)
c0:13.0 PCI bridge: Broadcom HT2100 PCI-Express Bridge (rev a2)
c0:14.0 PCI bridge: Broadcom HT2100 PCI-Express Bridge (rev a2)
c0:15.0 PCI bridge: Broadcom HT2100 PCI-Express Bridge (rev a2)
c0:16.0 PCI bridge: Broadcom HT2100 PCI-Express Bridge (rev a2)
c0:17.0 PCI bridge: Broadcom HT2100 PCI-Express Bridge (rev a2)



Any other files that I can check? how do I find our which pci express card is causing the issue?   Thanks in advance.
0
 

Author Comment

by:mokkan
ID: 33526841
Any help?
0
 
LVL 1

Assisted Solution

by:onaas
onaas earned 166 total points
ID: 33529486
Hi mokkan,

It's nice to know if you post more details like:

what rhel version do u have?
is it fresh installed server?
did you add any new hardware to the server?
what's the specs of your server?

more info would help us helping you!

-10x
0

Featured Post

Ransomware-A Revenue Bonanza for Service Providers

Ransomware – malware that gets on your customers’ computers, encrypts their data, and extorts a hefty ransom for the decryption keys – is a surging new threat.  The purpose of this eBook is to educate the reader about ransomware attacks.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

rdate is a Linux command and the network time protocol for immediate date and time setup from another machine. The clocks are synchronized by entering rdate with the -s switch (command without switch just checks the time but does not set anything). …
Linux users are sometimes dumbfounded by the severe lack of documentation on a topic. Sometimes, the documentation is copious, but other times, you end up with some obscure "it varies depending on your distribution" over and over when searching for …
Learn how to find files with the shell using the find and locate commands. Use locate to find a needle in a haystack.: With locate, check if the file still exists.: Use find to get the actual location of the file.:
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.

791 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question