Link to home
Start Free TrialLog in
Avatar of crp0499
crp0499Flag for United States of America

asked on

Problems with Dell Perc H810 cards - twice

We have a super weird thing happening.  We have two Dell R730 servers, both with Perc H810 controllers in them.  The two servers are connected to an IBM V3700 and a Dell MD1220.

Last week, we rebooted the first host and at the boot, we got this error:

"LSI-EFI SAS Driver:
Unhealthy status reported by this UEFI driver without specific error

UEFI0116: One or more boot drivers have reported issues
Check the Driver Health Menu in the Boot Manager for details.

One or more boot drivers require configuration changes.  Press any key to load the driver health manager for configurations."

So, we got that after the reboot and after some googling, we decided to move the Perc H810 out of the second server into the first and it booted right up, no issues.  So, we chalked it up to a bad card, ordered a replacement, put it into the first server and it booted just fine as well.  There you have it, we had a bad Perc card.

Now, fast forward a week.  Server 1 has it's new Perc and it's running great, seeing all of the storage and we are happy.

Server 2 is running great with it's Perc card that we stole from server 1 last week and life is good.

Now, tonight, I need to reboot server 2 and boom, it hangs and is now reporting the exact same error as noted above.

Pressing any key does nothing.  The machine will not boot into anything...iDraq doesn't even kick work when it's in this state.  All we can do is remove the Perc card and it boots normally.

It "could be" that we lost two Perc H810 cards, but it's too coincidental to me.  It seems something else is amiss and I can't put my finger on it.

Anyone else know what's going on here?

Thanks

Cliff

By the way, we migrated all of the active VMs to the first server with it's new Perc H810 and we plan on replacing the H810 in server 2 that has now seemed to gone kaput!
Avatar of Dr. Klahn
Dr. Klahn

Swap your spare motherboard into the first system, let it run for a week and see if the problem is alleviated.  If so, there's probably a PCI / PCIe bus issue.

Side note:  Check the BIOS revision for both systems and see if they are identical, and also check the Dell BIOS updates to see if there is (a) a later one than that installed in the servers, (b) which directly addresses this specific issue.  Don't update the BIOS just to update it; that is begging for trouble.
This does not imply a problem with the card, it can be caused by a failed disk, foreign disk, cable problem etc.  You have to use the configuration utility to see what the problem is, but it hangs rather than going into the configuration menu.

Being "old school" I would switch it into BIOS mode, reboot and watch the old fashioned boot menu and use <ctrl> R , fix the disk problem and then switch it back to UEFI.
Avatar of crp0499

ASKER

That's it.  Nothing works here.  The server locks up and no matter what I change in regards to the boot order, it won't enter setup, it won't bring up the boot menu, nothing.  The only way to get the server to boot is to remove the card and power cycle it.  Again, this card has been working for months, even thru reboots.  My hang up is two exact server, connected to the exact same storage, are exhibiting the exact same issues with the exact same Perc card.  I could handle it idea that two cards just "went bad" but it just seems odd and I feel like I'm missing something.

Looking in idrac, for the Perc H810, I see this error under foreign config:

STOR079: The device does not support this operation or is in a state that does not allow this operation.  Make sure the device supports the requested operation. If the operation is supported, then make sure the server is turned on and retry the operation.

I am also wondering if what I am doing will work.  You see, both R730 servers are connecting to the same MD1200 via the same Perc cards.  The first server was used to create the array that's in the first server so I'm expecting the second server with its second Perc card to see that array and provide access to it.  Shared storage.  That seems perfectly normal to me, but who knows.
ASKER CERTIFIED SOLUTION
Avatar of Member_2_231077
Member_2_231077

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of crp0499

ASKER

Months! Literally. Both esx hosts show the same storage and I was even vmotioning between the two hosts.

That being said, I'm onsite. I took the card and put it into a pc and it halts the bios just like in the server.  It seems the card is bad.

What u say matches what I thought. Do u have any documentation on your recommend about the md1200 not being used for shared storage?  Not that I doubt u, but when I was called to troubleshoot this, I was thinking the same thing.
Avatar of crp0499

ASKER

Never mind! I found it! I'm an idiot! Going to clean up this mess I made and get some shared storage!

Thanks Andy
The MD1200 could be in split mode, but then you couldn't vmotion unless it was just acting as local storage with vSAN software on top.

VMware only has one host accessing each VM's files so although they are both accessing it they never access the same data area, that would explain why it's not corrupting the data. Server A never needs to see Server B's cache. I'm surprised vMotion works though, because that's one case when one server does have to access the data area that was previously used by the other one, the reason there's no corruption is probably that the cache is flushed to disk fast enough.

If the card stops a PC with no disks connected then I guess it must be faulty, I would try clearing its cache just to confirm though by removing the battery for a few minutes.
crp0499 

I missed it.  What did you do?  What was the issue?  Is your MD1220 eating PERCs?
Avatar of crp0499

ASKER

the 1200 is not for shared storage. it can’t be shared between hosts, so that’s what I was missing.
If it were Hyper-V you could share it using HBAs rather than RAID cards as you could use Storage Spaces but there's no spftware RAID with VMware.