Avatar of Jim Nicolis
Jim Nicolis
Flag for Australia asked on

Raid 10 - 2 disks Failed should do 1 at a time

Hi

I have a server that has Raid 10 6 Disks, I have had 2 disks fail and have only got 1 in stock and 1 is coming

Should I install the 1 now and let that first one Sync up and then do the other one when it arrives or am I better to do both at once

Thanks Adam
Server HardwareStorageRAID* Hard Disk

Avatar of undefined
Last Comment
David Johnson, CD

8/22/2022 - Mon
andyalder

RAID 10 is made od several RAID 1 mirrors that are then striped so the question is the same as "I have three RAID 1s and a disk in two of them has failed, should I replace one now or do both at once. The answer to that is obvious, you would replace one immediately.

Rebuilding also means more work for the controller so one at a time is less I/O for it than two at once. Some controllers won't even do both at once even if you change both at the same time, HPE controller would queue one rebuild while it completed the other.

Jim Nicolis

ASKER
This is on an IBM x3650m2 with an M5015 Controller, running ESXi 5.1 and a few VM, should I turn the server off completely and insert new Disk then boot up and let it do it a thing or should I go into Controller while it's rebuilding so it doesn't put ESXi and VM up

Can you advise of the best process to swap the hard drives over.

Thanks Adam
David Johnson, CD

This is on an IBM x3650m2 with an M5015 Controller, running ESXi 5.1 and a few VM, should I turn the server off completely and insert new Disk then boot up and let it do it a thing it is hot-swappable don't turn it off just replace the drive.
This is the best money I have ever spent. I cannot not tell you how many times these folks have saved my bacon. I learn so much from the contributors.
rwheeler23
noci

And what does hurt the most in production.... Delays due to resync or no production during resync.
resync can be slightly faster when there is heavy IO during production.  And you halt production.
If there is hardly any IO (involving the affected disk, then you should not notice the resync.

(the resync is done by the controller... so no need to shut systems from that perspective, the failing disk is out of commission anyway.)
David Favor

Wow 2x disks at the same time.

I'd likely do a power down + replace both disks + power up.

As David Johnson said, if your disks are hot swappable, just pop out the bad ones, then pop in the new ones.

As noci suggested, if you do one disk at a time, your resync will be longer as all data will first disperse across 1x new disk, then when you insert the 2nd disk, you'll have data dispersion startup again. How exactly this works, depends on your controller + OS, to many factors go guess.

Summary: Installing 2x new disks at the same time, is likely path to fastest recovery time.

Note: Best do this now, as 2x disks down out of 6x disks is getting close to becoming a very complex problem.
andyalder

Rebuild overall may be longer doing it one at a time but tomorrow's rebuild will be quicker if one is done today. Since they have a disk available they might as well change one of them now.

The controller is M5015 which is based on LSI9260 and can even be cross-flashed butr obviously not in a production environment.
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
Philip Elder

Keep in mind that the rebuild will place a lot of extra stress on the failed disk's partner. If possible, check the RAID controller's logs to see if there are any "Predictive Failure" flags against either.

We would run one at a time.

Make sure the backup is good. We have lost a server during a rebuild when its partner failed.
Gerald Connolly

Make sure you have a full valid Backup AS SOON AS POSSIBLE

Your data is at risk, so hot replace one of the failed disks ASAP and the other as soon as you get a replacement.
BUT get that Backup done even sooner

If you are ordering new disks order at least 3 so you have one to replace the failed disk and a couple of spares in case more disks do not survive the backup and the resyncs!
noci

Another thing to keep in mind, although not all disks wear evenly or are created equally they have a common equal property MTBF.
So on average they will fail after X operations.   If all disks were bought & activated at the same time then you may want to keep more spares.
I started with Experts Exchange in 2004 and it's been a mainstay of my professional computing life since. It helped me launch a career as a programmer / Oracle data analyst
William Peck
Jim Nicolis

ASKER
Thank for all the comments all have been very useful

I have been busy backing up the servers and have now completed

I have ordered a few more and hopefully should be here tomorrow or next day

Can I just ask, the Light on the HDD that has gone bad is red I think and on the bottom of the 2 lights on the caddy, meaning replace, this has ESXi 5.1 on it so I cant get into ServeRaid Manager, can I just confirm, while the server is up and running I just take the first bad one out and then replace it, the rebuild on that set should start after I put the new one in without me doing anything yes?, or was I turning server off, a few mixed answers on that.

Also, I am fairly sure I am right but the M5015 does is hot-swap capable yes?

Also found another issue that I am looking for a replacement, the battery on the Card has gone and needs replacing just an FYI

Thanks, Adam
andyalder

It should start rebuilding automatically if you fit a new disk, if you replace with an old one you may have to erase any foreign config on it.

There is no GUI for VMware but you can use StorCLI under ESX if you don't want to shut down and use the POST config tool. Unfortunately I think it has to be rebooted before STORcli works but atleast you can put it on for next time,
https://www.ibm.com/support/pages/storcli-command-line-utility-storage-management-v11412-vmware-esx-ibm-and-lenovo-systems
ASKER CERTIFIED SOLUTION
David Johnson, CD

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.