Avatar of microone
microone
 asked on

What are the steps to replacing a faulty hard drive in a RAID on a Dell Power Edge T620?

What are the steps to replacing a hard drive in a RAID on a Dell Power Edge T620?

There are four hard drives. I believe it is a RAID 10 configuration. Two of the four hard drives are blinking an amber light in addition to the green light. I believe that indicates imminent failure and that is why I'm asking about replacing them. The customer has two identical hard drives installed that are brand new but not part of the RAID. Is it simply a matter of pulling the "bad" drives out and replacing them with the spares or must I configure the RAID in some way via software or BIOS?

Storage HardwareRAIDDellStorage

Avatar of undefined
Last Comment
Seth Simmons

8/22/2022 - Mon
Seth Simmons

Two of the four hard drives are blinking an amber light in addition to the green light.

yes..flashing both means predictive failure

Is it simply a matter of pulling the "bad" drives out and replacing them with the spares or must I configure the RAID in some way via software or BIOS?

if there are 2 identical drives just like the others sitting there doing nothing, you need to assign those drives as global hot spares.  once that is done, you need to mark the predictive failing drives as failed which will trigger a rebuild on the hot spare.  however, you must only do one at a time else you risk losing the entire array if both failing drives are part of the same stripe set.  make sure you have a good backup before doing any of this in case something goes wrong.

once one drive is completely done rebuilding and is healthy, you can mark the other drive as failed which will trigger a rebuild on the other hot spare.  the time it takes for a rebuild depends on a number of factors including drive speed, size and load on the server

this can be done either with OMSA (which OS is on this system?) or by rebooting and going to the controller during POST
microone

ASKER
The OS is Windows 2012 Server Standard with Hyper-V. OMSA is not installed. I'm going to do that right now. I'm assuming OMSA will be able to discern the RAID type as well as identify the two problematic drives?

Seth Simmons

yes it will show you both at the logical and physical level
you can do anything with the raid array that is supported by the controller
Experts Exchange has (a) saved my job multiple times, (b) saved me hours, days, and even weeks of work, and often (c) makes me look like a superhero! This place is MAGIC!
Walt Forbes
rindi

If two are flashing, before replacing the disks make sure you have a good, working backup. Then replace only one of them & wait until it has been completely resynced. Only replace the other bad disk after that.

If the server has empty drive bays, I'd suggest adding at least another disk of at least the same size as a "Hot-Spare".
Philip Elder

Yup. Backup First. That is an absolute must.

Are the backups test restored? The backup is not Known Good if they have not been. The worst place to figure out the backups are bad, usually due to a bad chain of files, is when there needs to be a recovery.

The iDRAC should have an IP address if it was set up. It will give at least the basic information as far as RAID controller whether hardware or software and the RAID array and logical disk setup.

NOTE 1: A rebuild will stress the other drives. If both blinking drives are a mirrored pair in the RAID 10 if the second disk dies the array will die with it. If the second blinking drive is in the other RAID 1 pair then the risk is lower but still there.

NOTE 2: If the RAID controller is a PERC hardware controller the chances of a lockup on drive failure is minimal. If the RAID controller is chipset software based, then there is a risk the server will go full-stop if a disk dies. Keep this in mind.

The RAID setup can be accessed by rebooting and hitting a function key as indicated on the POST screen to get into the RAID BIOS.
microone

ASKER
I am having a problem with the OMSA software which is the method I would prefer to use. I downloaded the "OM-SrvAdmin-Dell-Web-WINX64-9.1.0-2757_A00.exe" file, unzipped it and successfully ran the Setup program. However, when I put in my credentials, it pauses for a few seconds and prompts me for the information again.


 The credentials are correct because when I purposely put in a bad password, I get a message informing me the the login failed. I'm not sure what is happening. Did I perhaps download the wrong program or maybe miss some fundamental step? The server is a PowerEdge T620,running Windows 2012 Std.

⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
microone

ASKER
Excellent point on the backup. We're using a Datto device which can instantly virtualize a failed system. However...it has failed to be current on at least one occasion. I will definitely test.
Seth Simmons

However, when I put in my credentials, it pauses for a few seconds and prompts me for the information again.

have you tried to login with a local administrator account?

Did I perhaps download the wrong program or maybe miss some fundamental step?

this is a newer version though that one should work

https://dl.dell.com/FOLDER05558179M/1/OM-SrvAdmin-Dell-Web-WINX64-9.3.0-3465_A00.exe


Philip Elder

At that age of machine, I'm thinking the default iDRAC credentials sticker is on the box somewhere. I'm not 100% sure though.
Your help has saved me hundreds of hours of internet surfing.
fblack61
microone

ASKER
Sorry for the delay in getting back to you on this. I was finally able to get v9.3 of the OMSA to work properly. To recap, the server is a PowerEdge T620 running Windows Server 2012 with a Perc H310 controller.

Here's what I found:

Physical disks 0:1:1 and 0:1:2 show a predictive failure.
Physical disks 0:1:4 and 0:1:5 show that they are global hot spares.

Is it simply a matter of setting the task of each of the "soon to fail disks" to "offline"? Will the system then start updating the hot spare into the RAID without any further input from me?

Are there any "gotchas" that I should be aware of?

I have good backups of the data and I will check them before doing anything. Also, I will do one disk at a time, a day or so apart.

Thank you for your continued assistance with this.

ASKER CERTIFIED SOLUTION
Seth Simmons

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.