Link to home
Start Free TrialLog in
Avatar of jhieb
jhiebFlag for United States of America

asked on

Replace drive and rebuild RAID on Dell PowerEdge 2900

Hello,

I have a Dell PowerEdge 2900 Server. The primary drive array has three drives and it is RAID 5. The system came this way, and it has been humming along for quite a long time. Recently, one of the drives became degraded and the light on the drive flashes the green and amber light. The amber light, I assumed, means that there is a problem with the drive and it needs to be replaced.

So, I found a replacement drive on Amazon.com and installed it (the drive is hot swappable, but I cold booted to be safe). I installed the drive, but the green and amber light is still flashing. How will I know if the RAID is being rebuilt? I have a hunch that nothing is happening but I don't know how to confirm that.

By the way, this is a VMWARE ESXI server so this is also the operating system I am running.


Thanks,
John
Avatar of David
David
Flag of United States of America image

well, if you cold booted, I guarantee it isn't rebuilding.   The firmware is designed to only do automated rebuild if you yank the old drive and put the new one in the same slot.
Avatar of jhieb

ASKER

Ok, so what do you suggest?
SOLUTION
Avatar of PowerEdgeTech
PowerEdgeTech
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of jhieb

ASKER

Thanks PowerEdgeTech. I didn't see your post until now. This is what I did.

I rebooted and looked at the drives using CTR-R. One of the drives (presumably the one that I replaced) was listed as missing. I could not get it to show anything but missing even if I shut down the computer and put the other one back in.

I couldn't see any other option allowing me to rebuild the drive, or make the new disk a hot spare.

So, I rebooted the computer again and after if finished booting to the ESXI screen, I pulled the disk out hot. After a couple of minutes, I put it back in.

The disk lights, which at one time was showing "predictive failure" (from your description), they were alternating amber/green, are now showing just the amber on the right light. The drive is also chattering like it is being written to.

I believe the previous drive did indeed fail. It kept trying to seek and then stop like a failed drive. Ironically, I never did once look at CTRL-R to see what it said about the drives or the RAID. Because of the sound I just assumed it failed.

The amber light (the right one) is still flashing, and it keeps chattering like it is being written to. It chatters and then stops for up to a minute. Hopefully it is really doing something.  So, I am leaving it alone for the time being just in case it is actually rebuilding.
If the drive was "missing", it is likely it was flagged as "foreign" (would say so on the PD MGMT screen), in which case, the config would need to  be cleared before you can do anything with it.  When rebuilding, the drive will usually be flashing fast green.
Avatar of jhieb

ASKER

I think you are right. The drive is still flashing amber. I will have to take a look at this tomorrow and see if I can figure out how to clear the config. I don't recall seeing how that could be done.
Boot to CTRL-R, highlight the controller, hit F2, Foreign, Clear.
Avatar of jhieb

ASKER

I am running PERC 5/I Integrated Bios Configuration Utility 1.04-019A

I booted the server and pressed CTRL-R. I highlighted the controller, and pressed the F2 button, which is Operations. When the Operations window opens, there are three menu items:

Create New VD
Reset Config
Foreign Config

The only thing I am allowed to do in this menu is to Reset the Config. I cannot move my cursor up or down to either of the other two menu choices.

When scroll down to the virtual disk and view it, it still shows the first two drives, and one missing.

When I press CTRL-N to view the next screen, I see one drive listed as failed.
One drive as failed??  And you took the "bad" drive out already?   Unless the drive that you just  put in was the one that failed ... then you are looking at 100% data loss or an expensive data recovery.
Avatar of jhieb

ASKER

That's a bummer. At this point, I can live with the data loss. I have backups of my data from this server and I can start over if I need to. I hope my replacement drive is good. It is new and it should be. I need to figure out how to make it so this drive is listed as good so that I can start over and begin a new RAID. What is the best way to do this?
Are you still able to boot the OS on this RAID and the "failed" drive is simply the replacement?  Or do you now have two drives offline?

If the former, then you need to boot to CTRL-R, CTRL-N to go to the PD MGMT page, highlight the "failed" drive, F2, then Rebuild.

(If you need/want to delete the RAID and start over, simply highlight the controller, F2, Clear Config.  All drives will then show as Ready (if functional), including the 'failed' flag on the failed drive.)
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of jhieb

ASKER

I tried to "...you need to boot to CTRL-R, CTRL-N to go to the PD MGMT page, highlight the "failed" drive, F2, then Rebuild", but I am not able to select Rebuild. The only option the menu allows me to do is to select the LED Blinking menu choice.

Something is not right, and I wonder if the problem is my RAID controller. I created a boot disk and ran the DELL Diagnostics Utility. Even in the Utility, I see two SAS drives but not three. I see my three SATA drives. There is one SAS drive missing even rom the Diagnostics Utility.

 I am all for rebuilding the array if I have to, but if the server will not see the drive then there is no reason to go there. What else is there to do?
The controller does see the drive though ... marked as failed.  I agree something isn't right, but it is more likely your drive than the controller.  What model is your drive/where did you get it?
Avatar of jhieb

ASKER

The drive model is a Cheetah T10 and ST3300555SS. The capacity is 300GB SCSI. I purchased it on Amazon.com and the store is:

www.amazon.com/shops/A1YVNUWCANOJXA
Dell does have a branded version of the T10 ... if this does not have a Dell label and/or part number on it, then it would not be certified ... non-certified drives sometimes do not play nicely with controllers.

(btw ... drive would be SAS - Serial-Attached SCSI, as SCSI is not supported on this server :))
Avatar of jhieb

ASKER

That could be it. Ironically, none of the three SAS drives have a dell label or part number on them. They are all the same exact drive and the only thing really different is the lot number and serial number. The rest looks the same. Quite a few years ago, I purchased this server from a reseller of used Dell servers out of Texas.

I am not sure what to do, now.
So you simply have the Seagate-branded disk with the stock seagate firmware rather than the OEM Dell branded HDD with Dell firmware...   No worries, it is good enough.

But still I am confused,
We established the controller sees 2 SAS drives (plus others we don't care about).

Of the two SAS drives it does see, are both of them HDDs you originally had in the system?  Meaning is the replacement drive bad, or a second SAS drive?

If the replacement is no good, then replace the replacement while system is on, and it may auto rebuild and you will be fine.

If, however, the SAS drive that is bad is the SECOND drive to fail out of your original three), then replace that failed drive, build a fresh new RAID5, and restore.

But if it was me, after losing 2 of 3 disks, then likely the 3rd drive has togo.    I would then just buy 2 600GB SAS or larger disks and go to a RAID1.  You'll get better performance anyway, and replace that other ancient drive.
Avatar of jhieb

ASKER

Yes, the SAS drives I see are the original ones in the system and they are not the one I replaced. I verified this by pulling the new drive and then by looking at the utility.

So, I might be out of luck.

Perhaps, if there is nothing else to do except buy more drives, I might try something else. I have three 1TB Sata drives in this system. Two of them are RAID 1/. The other one is a hot spare. Perhaps, I make the SATA RAID my primary drive, and then let the third SATA drive be for data images. Does this sound like a better option if I cannot figure out the SAS issue?
Avatar of jhieb

ASKER

I contacted DELL support through the DELL public support forum and was able to get the problem resolved. The way it resolved was a bit of an accident. I was sent an ISO, which is a LinuxLive DVD. It has a Dell diagnostics program on it. The program ran through the system and created reports. After I ran this program, I noticed that all my drive light were solid green. So, I rebooted the system and went back into the PERC 5/I utility.

While in the utility, I looked at the RAID and the drive was still listed as missing. Then, I went to PD Mgmt and noticed that the drive had a different listing (I forget what it said). It might have said Ready or something like that. So, I assigned it as a Hot Spare and the rebuild started. The drive is now rebuilding.

I wish I knew what made the drive available. This was a royal pain. The Dell Rep is reviewing my log files and hopefully he sees something. I will let you know what comes of this before closing the call.

By the way, I was seconds away from yanking my SAS drives and setting up my SATA drives as individual drives. Seriously, my finger was inches away from the power button and then I noticed the green lights. Now, I almost wish I would have done that. I think I could have lived with no RAID at this point or at least a RAID 1 on two of the 1TB drives, and then make the other 1TB drive a data drive.
Avatar of jhieb

ASKER

The rebuild failed and it looks like the drive is bad. So, I am going to do something else. Thank you both for your help. I appreciate it. I learned a lot about my DELL PowerEdge server today.
Avatar of jhieb

ASKER

Top notch help! Thank you both for sticking with me as I figured this out.