Solved

PERC 3/Di RAID 5 Rebuild on PE2500

Posted on 2006-10-26
8
5,551 Views
Last Modified: 2011-08-18
Only 4 disks inserted on PowerEdge 2500.  Disks 0,1,2,3
All four configured as a single RAID 5 with 2 containers, one 4GB container (set to boot), and one 46GB container (for data).

Disk 3 indicates an amber light condition.  A user takes that drive and pulls it out of the backplane and then reseats it.  The intent here was to just have it rebuild.  Upon reseating of drive 3, drive 0 indicates an amber light and the server locks.  In the next attempt to reboot, it is reported that no boot devices are found.

So, booting up into the perc configuration utility it shows that drive 0 and drive 3 are missing in the container properties.  It looks like this for both containers:

0:00:0     --Missing Drive--
0:01:0     <drivename>       <drivespace>
0:02:0     <drivename>       <drivespace>
49:62:0   --Missing Drive--


Watching the entire post process of the machine shows the two containers are both found, but in a state of "unknown".  And it always ends with "no boot device available, press F1 to retry, press F2 to enter setup"


Enter Dell tech support

After some explanations and some poking and prodding, they send 2 replacement drives and a backplane and wish us well with our backups (not bashing here, just sayin).
A little more prodding gets the idea to use the <CTRL-R> function inside the Perc config util.

This 'rebuild' option is performed with drive 3 removed.

Upon rebooting the containers now report as "critical" and not unknown (this is good, yes?)
And, the config util looks like this now for each container:

0:00:0     <drivename>       <drivespace>
0:01:0     <drivename>       <drivespace>
0:02:0     <drivename>       <drivespace>
49:62:0   --Missing Drive--

To me, this is looking very promising...  But, it still goes "no boot device available, press F1 to retry, press F2 to enter setup"
And everytime I enter the config util, I am forced to "accept" the new configuration EVEN THOUGH NO CHANGES WERE MADE SINCE I LAST ACCEPTED.  This is the part that bugs me.

Back with Dell support, I am told that the critical container should be bootable.  or at least that a rebuild should take place if I insert a NEW drive into drive 3.  This makes sense to me, but neither option seems to work.
The part that keeps bugging me is that each and every boot, no matter how many times I "accept" the new configuration, the config never "sticks".

Any ideas or suggestions as to why the seemingly reported container cannot be found to boot?
(I double checked the system bios and the scsi bios for correct boot orders and they are correct)

Thanks



0
Comment
Question by:kkohl
  • 4
  • 3
8 Comments
 
LVL 1

Expert Comment

by:egrylls
ID: 17814991
I'll ask the stupid questions like you've got both the lastest firmwares for the perc and the system board?  Also you might check to ensure that the drives you are replacing with are (hopefully) the same make and model.  At the very least that you are mixing different RPMs as I have had Dell send me 10k's when the replacements should have been 15's and that didnt work!!

My other comment is I dont know why it would be insisting you accept the new config unless potentially dead battery or something?  I would replace the perc, install all the original drives and tell it to read the config from disk.  That MIGHT help you get your RAID back.  But if you really lost 2 drives or something on there is written differently, you might have to rebuild it from scratch but I would trash that PERC card for a spare.
0
 
LVL 3

Author Comment

by:kkohl
ID: 17815034
Well, this is at least something I haven't heard yet.
As far as the stupid questions, the rpms are the same, but I am hesitant to upgrade the perc or system board firmwares, as I have seen numerous warnings to upgrade the drivers first and I can't do that yet :-)

Dead battery is interesting and on the note of replacing the perc we have a thought of taking down another identical PE2500 and removing the drives from it... inserting the three "reporting as present" drives from the critical raid and attempt a boot from that angle.
0
 
LVL 1

Assisted Solution

by:egrylls
egrylls earned 75 total points
ID: 17815209
Yeah - if you have the spare box do exactly that and then in the perc menu tell it to read from disk and make sure you save the config on the way out.  Do you still have the original drives?  The percs are pretty stable but when they start to go flaky it can really be a pain in the butt.

As for the drivers warning, that is true when you're in the OS, but right now Dell's left you up the creek with no paddle.

I'd repopulate the original drives in another 2500 chassis and try the read from disk trick with the original drives in place.  One drive bad - okay...2 drives at the same time - go for the PERC.  You havent been able to boot at all so you really dont have anything to lose at this point.
0
 
LVL 1

Expert Comment

by:egrylls
ID: 17815220
I would add I have done this PERC trick before and in fact walked our most junior admin through this just this weekend over the phone on a 2650 and it worked like a charm.  His issue was power, but he had to read the config from disk and was up within 20 minutes after populating the new chassis
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 34

Accepted Solution

by:
jamietoner earned 425 total points
ID: 17815389
Either the drive or backplane slot reporting this "49:62:0   --Missing Drive--" is toast it should be reporting 0:03:0. If you performed the ctrl+r with a drive that was orginaly part of the array removed the array is also now corrupt (which would be why you constantly get the configuration change error),  also ctrl r will usualy fail when you have multple containers on the same drives ( you had 2 containers on the same 4 drives). What your going to need to do to get the server backup and running is when the replacment parts arrive replae the backplane, backplane cable, and hdd(s)(atleast hdd id 3). clear the configuration to erase all the containers. Upgrade the system bios, esm and perc firmware and download the latest perc driver to put on a floppy for the os install, Make sure the drives are being properly recognized, an id like 49:62:0 means its not being properly recognized. if the drive (s) still arnt recognized properly the controller will need to be replaced. When you create the new raid 5 only create 1 container( you can partition the drive later to have an seperate C: and D: drives), creating 2 containers will not improve performance it will just cause headaches later. With the new raid 5 created reinstall the os drivers and apps and recovery the data from backup. If you have data that is needed and was not backed up, you would need to use an application like raid reconstructor (www.runtime.org), or send the drives to a professional data recovery center.
0
 
LVL 3

Author Comment

by:kkohl
ID: 17816402
Did some further testing with Dell tech and decided trying to boot in the other chassis would not be beneficial.

With one of the new disks in (and all others removed), we initialized and created a new container.  On each boot the container information was retained.  Therefore, as Jamie points out, the array is corrupt itself and not the perc or any internal setting.

So, I am left with what I believe to be three good drives and one bad drive of a corrupted array.

The risks of the CTRL-R failing on this dual container setup was understood (no idea why it was like that...) but c'es la vie

Going to be using a raid reconstructor for some non backed up data.  All else is good.

Thank you much for the responses
0
 
LVL 1

Expert Comment

by:egrylls
ID: 17825291
Yeah - once you initialized that put you up the creek sans paddle.  I must have missed that in the original post
0
 
LVL 3

Author Comment

by:kkohl
ID: 17845783
-- the initialization took place on new drives, not on any of the original ones --

As best as I can figure, it looks like this is the gist of what happened...


Four drive RAID5 Array split into 2 containers (4GB and 44GB)
Drive 3 fails and is removed.
Drive 3 is reinserted to attempt to rebuild.  This reinsertion caused Drive 0 to report as failed.
The RAID is broken at this point and the computer locks up.
Upon a reboot attempt, no boot container is found.
Per tech support, Drive 3 is removed and <CTRL-R> option is used on Drive 0.
The forced rebuild fails.  Most likely because of the split containers on one array.
Drive 0 reports as present but the RAID is corrupt and non-recoverable.


It is my belief that there was good chance that drive 0 was really a false failure and had a shot at recovery by forcing it back online... the split containers prevented this.  Thanks for the responses.  
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

I have written before on the benefits of using a Boot media other than your HDD when it has become infected.   The article I wrote about creating a bootable CD/DVD/USB (http://e-e.com/A_2343.html) was mainly concerned with building a UBCD4Win on CD …
Lets start to have a small explanation what is VAAI(vStorage API for Array Integration ) and what are the benefits using it. VAAI is an API framework in VMware that enable some Storage tasks. It first presented in ESXi 4.1, but only after 5.x sup…
This video teaches viewers how to encrypt an external drive that requires a password to read and edit the drive. All tasks are done in Disk Utility. Plug in the external drive you wish to encrypt: Make sure all previous data on the drive has been …
This Micro Tutorial will teach you how to reformat your flash drive. Sometimes your flash drive may have issues carrying files so this will completely restore it to manufacturing settings. Make sure to backup all files before reformatting. This w…

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now