Raid 10 - 2 disks Failed should do 1 at a time


I have a server that has Raid 10 6 Disks, I have had 2 disks fail and have only got 1 in stock and 1 is coming

Should I install the 1 now and let that first one Sync up and then do the other one when it arrives or am I better to do both at once

Thanks Adam
David Johnson, CD

8/22/2022 - Mon

RAID 10 is made od several RAID 1 mirrors that are then striped so the question is the same as "I have three RAID 1s and a disk in two of them has failed, should I replace one now or do both at once. The answer to that is obvious, you would replace one immediately.

Rebuilding also means more work for the controller so one at a time is less I/O for it than two at once. Some controllers won't even do both at once even if you change both at the same time, HPE controller would queue one rebuild while it completed the other.

Jim Nicolis

This is on an IBM x3650m2 with an M5015 Controller, running ESXi 5.1 and a few VM, should I turn the server off completely and insert new Disk then boot up and let it do it a thing or should I go into Controller while it's rebuilding so it doesn't put ESXi and VM up

Can you advise of the best process to swap the hard drives over.

Thanks Adam
David Johnson, CD

This is on an IBM x3650m2 with an M5015 Controller, running ESXi 5.1 and a few VM, should I turn the server off completely and insert new Disk then boot up and let it do it a thing it is hot-swappable don't turn it off just replace the drive.
And what does hurt the most in production.... Delays due to resync or no production during resync.
resync can be slightly faster when there is heavy IO during production.  And you halt production.
If there is hardly any IO (involving the affected disk, then you should not notice the resync.

(the resync is done by the controller... so no need to shut systems from that perspective, the failing disk is out of commission anyway.)
David Favor

Wow 2x disks at the same time.

I'd likely do a power down + replace both disks + power up.

As David Johnson said, if your disks are hot swappable, just pop out the bad ones, then pop in the new ones.

As noci suggested, if you do one disk at a time, your resync will be longer as all data will first disperse across 1x new disk, then when you insert the 2nd disk, you'll have data dispersion startup again. How exactly this works, depends on your controller + OS, to many factors go guess.

Summary: Installing 2x new disks at the same time, is likely path to fastest recovery time.

Note: Best do this now, as 2x disks down out of 6x disks is getting close to becoming a very complex problem.

Rebuild overall may be longer doing it one at a time but tomorrow's rebuild will be quicker if one is done today. Since they have a disk available they might as well change one of them now.

The controller is M5015 which is based on LSI9260 and can even be cross-flashed butr obviously not in a production environment.
Philip Elder

Keep in mind that the rebuild will place a lot of extra stress on the failed disk's partner. If possible, check the RAID controller's logs to see if there are any "Predictive Failure" flags against either.

We would run one at a time.

Make sure the backup is good. We have lost a server during a rebuild when its partner failed.
Gerald Connolly

Make sure you have a full valid Backup AS SOON AS POSSIBLE

Your data is at risk, so hot replace one of the failed disks ASAP and the other as soon as you get a replacement.
BUT get that Backup done even sooner

If you are ordering new disks order at least 3 so you have one to replace the failed disk and a couple of spares in case more disks do not survive the backup and the resyncs!

Another thing to keep in mind, although not all disks wear evenly or are created equally they have a common equal property MTBF.
So on average they will fail after X operations.   If all disks were bought & activated at the same time then you may want to keep more spares.
Jim Nicolis

Thank for all the comments all have been very useful

I have been busy backing up the servers and have now completed

I have ordered a few more and hopefully should be here tomorrow or next day

Can I just ask, the Light on the HDD that has gone bad is red I think and on the bottom of the 2 lights on the caddy, meaning replace, this has ESXi 5.1 on it so I cant get into ServeRaid Manager, can I just confirm, while the server is up and running I just take the first bad one out and then replace it, the rebuild on that set should start after I put the new one in without me doing anything yes?, or was I turning server off, a few mixed answers on that.

Also, I am fairly sure I am right but the M5015 does is hot-swap capable yes?

Also found another issue that I am looking for a replacement, the battery on the Card has gone and needs replacing just an FYI

Thanks, Adam

It should start rebuilding automatically if you fit a new disk, if you replace with an old one you may have to erase any foreign config on it.

There is no GUI for VMware but you can use StorCLI under ESX if you don't want to shut down and use the POST config tool. Unfortunately I think it has to be rebooted before STORcli works but atleast you can put it on for next time,
David Johnson, CD

