How to identify component causing "bus fatal error" on Dell PowerEdge T110

A Dell PowerEdge T110 server running Server 2009 R2 has been rebooting because of a BugCheck.  The Dell hardware logs show the following (errors are the same each time the server reboots):

	Tue Mar 21 12:07:51 2017	A runtime critical stop occurred.	
	Tue Mar 21 12:07:24 2017	An OEM diagnostic event occurred.
	Tue Mar 21 12:07:24 2017	A bus fatal error was detected on a component at bus 0 device 28 function 5.	
	Tue Mar 21 12:07:24 2017	A bus fatal error was detected on a component at slot 4.

Open in new window

We have remote access to the server only.  Without physical access, how can we determine which component is causing the problem?

Thanks in advance for any assistance.
David HaycoxConsultant EngineerAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Dr. KlahnPrincipal Software EngineerCommented:
So far as I know there's not a utility to show PCI bus slot information.  PCI-Z comes close to what you need, but it does not report slot numbers.

You can get the information out of the Registry, but it is troublesome and requires exact knowledge of what is installed and where ... in other words, you need to already know the information you're trying to find out.

Open the Registry Editor and drill down to HKLM/System/CurrentControlSet/Enum/PCI.  Under this you'll find many, many keys corresponding to every PCI device which is now or has ever been installed in this system.  The LocationInformation field contains information about where that device was when it was last installed in the system.

PCI enumeration in Registry
Here's the nasty part.  Nothing indicates whether that device is in the system.  You have to know what is installed in that slot to know whether that device is installed in that slot.  This could be workable on a system that has never changed since the original installation, but on most systems things have been swapped back and forth, devices have changed, and each slot might have ten potential candidates that have been in that location.  So on most systems, this is not a workable way to find out what's in the system without seeing what's in that slot.  And then of course you already know what's in the slot.
David HaycoxConsultant EngineerAuthor Commented:
Fantastic, thanks.  By drilling down through the device manager I found this though:

Device Manager extract
So that's not a removable device in any case, it's on the system board I would say.  So if there's a fault with that it'll be either drivers, firmware or a board change?
Dr. KlahnPrincipal Software EngineerCommented:
You're correct.  That's part of the chipset and if it's hardware faulting, motherboard replacement is the only solution that comes to mind.

This particular device is a PCIe controller but a fault in any of the bus controllers could certainly cause problems on the PCI bus, resulting in the next logged error on the PCI bus at slot 4.

Keep in mind though, the problem could possibly go the other way as well -- a faulty device in slot 4 could cause the bus controller to fault.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Get Blueprints for Increased Customer Retention

The IT Service Excellence Tool Kit has best practices to keep your clients happy and business booming. Inside, you’ll find everything you need to increase client satisfaction and retention, become more competitive, and increase your overall success.

David HaycoxConsultant EngineerAuthor Commented:
Great, thanks.  We'll get slot 4 checked out.
Pity this was closed so quickly, was hoping someone would have ppsted the Linux method and also the generic Dell method of identifying the device. A lot of the time you can't even check via the OS as it won't boot with a PCIe error and all you get is POST error or the LCD display to go on.
David HaycoxConsultant EngineerAuthor Commented:
Oops, can ask another question easily enough if you think it will be worth it.
David HaycoxConsultant EngineerAuthor Commented:
Looks like PCI slot 4 is a USB3.0 controller we fitted recently in order to improve backup / restore speed.  We'll investigate it further.
PCIe 3.0 card?
David HaycoxConsultant EngineerAuthor Commented:
This one:

We've fitted many of these to different servers; this will be the first problem.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.