Avatar of jkirman
jkirman
Flag for United States of America asked on

How to swap out a failing SATA drive on an HP server with B110i SATA RAID controller

Greetings,

I have recently taken over an account that has a branch office with 3 older HP ProLiant servers and c. 12 PC's.  One of the servers is an older ProLiant ML110 G6 server running Windows Server Standard 2012 R2 with 12 GB RAM.   It is a Domain Controller with no principal applications on it.  It uses an HP Smart Array B110i SATA RAID controller with 2 arrays of SATA drives in RAID-1 config.  The 1st SATA array has 2 X 250 GB drives in Box 1 / Bays (Ports) 1 and 2 , and the 2nd SATA array has 2 X 1 TB drives in Box 1 / Bays (Ports) 3 and 4.

We are seeing a warning message when the server is rebooting, warning of an Imminent Failure of one of the hard drives.  I have attached a screen shot of the screen warnings.  Boot time warning from B110i SATA RAID controller - imminent drive failure.  The failing drive is in Box 1 Bay 2, so it's 1 of the 250 GB drives.  Specifically it is a model ATA VB0250EAVER, F/W version HPG0.

My question on this is whether this is truly a hot-swappable drive + controller.  The message in the attached screen shot says to make sure to only change out the failing drive when all drives are on.    I'm hoping someone who has worked with this controller is familiar with the proper way to swap out the failing drive.  I've worked with Dell servers for c. 25 years and the process there is beyond simple.  With Dell servers, a hot-swap chassis, and a PERC RAID controller, all I have needed to do is a) offline the failing drive, b) pull the drive, and c) put in the new / replacement drive.  Once the drive is replaced, it starts to rebuild.  With this HP B110i controller, I've read in some articles that the host server should never be shut off when the drives are replaced.  I'm also wondering if there is a similar process to offline the failing drive before you pull it out.  Past that I have zero experience with these controllers, and definitely do not want to pull out the hard drive when the system is running if it is not hot-swappable.

Separately, I've had the client get in touch with HP for out-of-warranty support, but they are looking to charge c. $850 for a single out-of-warranty support call (yow!!!).  So I'm hoping someone can provide advice and a drive-swap procedure from their experience on this before the client has to fork up a serious chunk of change to HP.

Thanks in advanced for any assistance on this.

jkirman
Server HardwareStorage Hardware

Avatar of undefined
Last Comment
jkirman

8/22/2022 - Mon
Dr. Klahn

Back up the failing array to an external device.  Then back it up again to a different device using different backup software.

Buy three drives to replace into the array, replace both drives in the array and then restore from the backup.  Put the third drive on the shelf and label it "RAID HOT REPLACEMENT FOR SYSTEM blah blah **ONLY**" so that (a) you have it immediately available in case of a future problem and (b) nobody steals it for some other system.

Why both drives?  If one drive is failing, the other one is probably also reaching end of life.  Drives are cheap; human time is not.  In any case, you will want two as-close-to-identical drives in the array and it may be difficult to obtain a drive to match the failing one.
jkirman

ASKER
Dr. Klahn - thank you for your recommendations on backing up to 2 media and then replacing the current RAID-1 drives.  However, your response did not answer my question.  Perhaps I wasn't clear, but when you say replace the drives - that is specifically what I'm asking about.  So for example -

1) I am requesting confirmation that the drives and controller are designed for hot-swap, or do I need to power off the server.  I'm 99% sure this is hot-swap, but would prefer someone who has worked with this already can confirm on that.

2) Do I just pull out the failing drive, or do I need to do something with the Array Configuration Utility or the Smart Storage Administrator CLI to prep the system?

3) As I'm looking at the ACU GUI right now using the Physical View, it shows all 4 drives in the system, with the 250 GB SATA in Box 1 Bay 2 showing a yellow exclamation mark.  When I right-click the drive, I only get options for More Information and View Status Alerts.  There are no options or controls anywhere to e.g. Offline the drive, Deactivate the drive, etc.  Again, I'm coming from a Dell world where the PERC controller and Open Manage software provide a lot of functionality and control on this part, and you can e.g. :

a) Offline a failing drive
b) pull out the drive after it shows Offline
c) put in a replacement drive
d) watch the drive rebuild

and that's basically it.  Here, with the B110i and the ACU GUI, I am not seeing any controls on how to physically prep the system to remove the drive as I would be able to with a Dell system, per a) thru d) above.

So again, since I definitely do not want to screw up the hardware, I'm requesting a true step-by-step description of the physical removal process, possibly working with or using the ACU GUI if that is helpful to the process.

Thank you.

jkirman
SOLUTION
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
ASKER CERTIFIED SOLUTION
andyalder

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
Wakeup

Some good information/instructions here:
https://support.hpe.com/hpsc/doc/public/display?docId=c02279604

But Andy's right, not hot-pluggable.
I started with Experts Exchange in 2004 and it's been a mainstay of my professional computing life since. It helped me launch a career as a programmer / Oracle data analyst
William Peck
jkirman

ASKER
Thanks for your collective thoughts.  From everything you've written, looks like the approach should be:

- make a note of the serial number of the failing drive per the ACU interface  (already done)
- shut the server
- pull out the failing drive from Box 1, Bay 2, as per the S/N
- make a note of the replacement drive's model and serial number
- install the replacement drive
- power on the server

Now here's the part I'm assuming, since I've never dealt with this specific controller:

- Let the system boot normally
- After booting and login, go into the ACU and I'll see the RAID-1 container rebuilding

I've read through a few online postings by admins with similar issues with this controller, and get the impression that I should NOT go into the B110i BIOS at boot time or interrupt the boot process, but rather should simply let everything start up on its own, once I've replaced the failing drive.

Appreciate any confirmations on the above / latest post in advance, and thanks again for the info and thoughts to date.

jkirman
SOLUTION
andyalder

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
jkirman

ASKER
Andyalder, thanks for the additional details regarding the B110i controller,  Seems that these embedded / software controllers have mostly caveats attached to them, as it's somewhat mind-blowing for me to hear that you need to be in the O/S for the RAID-1 container to rebuild.  Then again, for the last couple of decades I've been creature of habit with Dell servers and PERC hardware controllers and have never used, nor even worked with, a software RAID controller.  I have also religiously avoided the S110 / S300 controllers and the like as I have read mostly complaints on performance of software RAID controllers as compared to e.g. the traditional PERC hardware controllers series.  I'm assuming they (s/w controllers) are recommended only for very budget-constrained situations.  Will hopefully be able to address the drive replacement in the next week or so and will advise of my experience with the ACU and the RAID-1 container rebuild.
jkirman

ASKER
Thanks for your assistance and detailed information.  Keys here were that 1) the controller is not hot-plug - in its current form - and 2) that the RAID-1 rebuild will not take place until you boot into the O/S.  FWIW in my readings I found that you could upgrade the B110i to a hot-plug controller by adding a License Key, available from Amazon for c. $75, but since a straight replacement will work fine, I wouldn't be experimenting with hot swap options.

Cheers and many thanks again.

jkirman
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.