[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 5634
  • Last Modified:

PERC 3/Di RAID 5 Rebuild on PE2500

Only 4 disks inserted on PowerEdge 2500.  Disks 0,1,2,3
All four configured as a single RAID 5 with 2 containers, one 4GB container (set to boot), and one 46GB container (for data).

Disk 3 indicates an amber light condition.  A user takes that drive and pulls it out of the backplane and then reseats it.  The intent here was to just have it rebuild.  Upon reseating of drive 3, drive 0 indicates an amber light and the server locks.  In the next attempt to reboot, it is reported that no boot devices are found.

So, booting up into the perc configuration utility it shows that drive 0 and drive 3 are missing in the container properties.  It looks like this for both containers:

0:00:0     --Missing Drive--
0:01:0     <drivename>       <drivespace>
0:02:0     <drivename>       <drivespace>
49:62:0   --Missing Drive--


Watching the entire post process of the machine shows the two containers are both found, but in a state of "unknown".  And it always ends with "no boot device available, press F1 to retry, press F2 to enter setup"


Enter Dell tech support

After some explanations and some poking and prodding, they send 2 replacement drives and a backplane and wish us well with our backups (not bashing here, just sayin).
A little more prodding gets the idea to use the <CTRL-R> function inside the Perc config util.

This 'rebuild' option is performed with drive 3 removed.

Upon rebooting the containers now report as "critical" and not unknown (this is good, yes?)
And, the config util looks like this now for each container:

0:00:0     <drivename>       <drivespace>
0:01:0     <drivename>       <drivespace>
0:02:0     <drivename>       <drivespace>
49:62:0   --Missing Drive--

To me, this is looking very promising...  But, it still goes "no boot device available, press F1 to retry, press F2 to enter setup"
And everytime I enter the config util, I am forced to "accept" the new configuration EVEN THOUGH NO CHANGES WERE MADE SINCE I LAST ACCEPTED.  This is the part that bugs me.

Back with Dell support, I am told that the critical container should be bootable.  or at least that a rebuild should take place if I insert a NEW drive into drive 3.  This makes sense to me, but neither option seems to work.
The part that keeps bugging me is that each and every boot, no matter how many times I "accept" the new configuration, the config never "sticks".

Any ideas or suggestions as to why the seemingly reported container cannot be found to boot?
(I double checked the system bios and the scsi bios for correct boot orders and they are correct)

Thanks



0
kkohl
Asked:
kkohl
  • 4
  • 3
2 Solutions
 
egryllsCommented:
I'll ask the stupid questions like you've got both the lastest firmwares for the perc and the system board?  Also you might check to ensure that the drives you are replacing with are (hopefully) the same make and model.  At the very least that you are mixing different RPMs as I have had Dell send me 10k's when the replacements should have been 15's and that didnt work!!

My other comment is I dont know why it would be insisting you accept the new config unless potentially dead battery or something?  I would replace the perc, install all the original drives and tell it to read the config from disk.  That MIGHT help you get your RAID back.  But if you really lost 2 drives or something on there is written differently, you might have to rebuild it from scratch but I would trash that PERC card for a spare.
0
 
kkohlAuthor Commented:
Well, this is at least something I haven't heard yet.
As far as the stupid questions, the rpms are the same, but I am hesitant to upgrade the perc or system board firmwares, as I have seen numerous warnings to upgrade the drivers first and I can't do that yet :-)

Dead battery is interesting and on the note of replacing the perc we have a thought of taking down another identical PE2500 and removing the drives from it... inserting the three "reporting as present" drives from the critical raid and attempt a boot from that angle.
0
 
egryllsCommented:
Yeah - if you have the spare box do exactly that and then in the perc menu tell it to read from disk and make sure you save the config on the way out.  Do you still have the original drives?  The percs are pretty stable but when they start to go flaky it can really be a pain in the butt.

As for the drivers warning, that is true when you're in the OS, but right now Dell's left you up the creek with no paddle.

I'd repopulate the original drives in another 2500 chassis and try the read from disk trick with the original drives in place.  One drive bad - okay...2 drives at the same time - go for the PERC.  You havent been able to boot at all so you really dont have anything to lose at this point.
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
egryllsCommented:
I would add I have done this PERC trick before and in fact walked our most junior admin through this just this weekend over the phone on a 2650 and it worked like a charm.  His issue was power, but he had to read the config from disk and was up within 20 minutes after populating the new chassis
0
 
jamietonerCommented:
Either the drive or backplane slot reporting this "49:62:0   --Missing Drive--" is toast it should be reporting 0:03:0. If you performed the ctrl+r with a drive that was orginaly part of the array removed the array is also now corrupt (which would be why you constantly get the configuration change error),  also ctrl r will usualy fail when you have multple containers on the same drives ( you had 2 containers on the same 4 drives). What your going to need to do to get the server backup and running is when the replacment parts arrive replae the backplane, backplane cable, and hdd(s)(atleast hdd id 3). clear the configuration to erase all the containers. Upgrade the system bios, esm and perc firmware and download the latest perc driver to put on a floppy for the os install, Make sure the drives are being properly recognized, an id like 49:62:0 means its not being properly recognized. if the drive (s) still arnt recognized properly the controller will need to be replaced. When you create the new raid 5 only create 1 container( you can partition the drive later to have an seperate C: and D: drives), creating 2 containers will not improve performance it will just cause headaches later. With the new raid 5 created reinstall the os drivers and apps and recovery the data from backup. If you have data that is needed and was not backed up, you would need to use an application like raid reconstructor (www.runtime.org), or send the drives to a professional data recovery center.
0
 
kkohlAuthor Commented:
Did some further testing with Dell tech and decided trying to boot in the other chassis would not be beneficial.

With one of the new disks in (and all others removed), we initialized and created a new container.  On each boot the container information was retained.  Therefore, as Jamie points out, the array is corrupt itself and not the perc or any internal setting.

So, I am left with what I believe to be three good drives and one bad drive of a corrupted array.

The risks of the CTRL-R failing on this dual container setup was understood (no idea why it was like that...) but c'es la vie

Going to be using a raid reconstructor for some non backed up data.  All else is good.

Thank you much for the responses
0
 
egryllsCommented:
Yeah - once you initialized that put you up the creek sans paddle.  I must have missed that in the original post
0
 
kkohlAuthor Commented:
-- the initialization took place on new drives, not on any of the original ones --

As best as I can figure, it looks like this is the gist of what happened...


Four drive RAID5 Array split into 2 containers (4GB and 44GB)
Drive 3 fails and is removed.
Drive 3 is reinserted to attempt to rebuild.  This reinsertion caused Drive 0 to report as failed.
The RAID is broken at this point and the computer locks up.
Upon a reboot attempt, no boot container is found.
Per tech support, Drive 3 is removed and <CTRL-R> option is used on Drive 0.
The forced rebuild fails.  Most likely because of the split containers on one array.
Drive 0 reports as present but the RAID is corrupt and non-recoverable.


It is my belief that there was good chance that drive 0 was really a false failure and had a shot at recovery by forcing it back online... the split containers prevented this.  Thanks for the responses.  
0

Featured Post

Get quick recovery of individual SharePoint items

Free tool – Veeam Explorer for Microsoft SharePoint, enables fast, easy restores of SharePoint sites, documents, libraries and lists — all with no agents to manage and no additional licenses to buy.

  • 4
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now