Replacement drive for failed drive in RAID-5 - shutdown and replace or hot-swap replace? - HP DL 380 G3

Posted on 2010-11-26
Medium Priority
Last Modified: 2012-05-10
This morning I discovered that our Windows 2003 Server (and Active Directory controller) had a failed drive. There are 5 36.8 GB drives configured in a RAID-5. I have a replacement drive coming in the next four hours.

Windows boots until the login screen, then it sits there at a gray screen and no users can login on their workstations or access the Exchange server. Even though 1-drive failing should not affect the server except to slow it down, it seems to and I'm concerned.

The HP tech recommend I replace the drive while the machine is on. Then wait for it to rebuild (~9 hours) and then see if it starts working.

This doesn't make sense to me. It makes more sense to me to hard shutdown the server, Replace the drive and then boot and let the rebuild process go on while the Windows services don't try to function off the failed drive.

It also makes me think that the I will get my AD controller functioning sooner this way.

This morning I restarted the server remotely with the ILO controller with a hard reset (not knowing a drive failed). In the first few minutes of the boot process the AD and file sharing services were working, then stopped and then I couldn't log in via the login window.

So, my guess is that the drive is reporting partial failure to the array controller and continuing trying to function, but in reality it is totally not working and the system would function better if the drive was out of the equation.

Following my logic or am I nuts and ignorant? :)


P.S. In another worst case scenario, another drive has failed and it isn't being shown on the drive lights. I can't access the Insight Diagnostics remotely.
Question by:sweetseater
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
  • 2
  • +1
LVL 76

Accepted Solution

Alan Hardisty earned 500 total points
ID: 34219470
The HP DL380 G3 has hot-swap drives, so you can happily replace a drive with it still running.

As for the grey screen - sounds a little peculiar, but might be disk issues that get resolved once the replacement drive has been installed.
LVL 47

Assisted Solution

David earned 500 total points
ID: 34219473
Your premise is a bit incomplete.  The XOR parity in RAID5 protects against both full drive failure, and block failure.  If any surviving disks have a bad block, then you lose the entire stripe, so you can still have data loss.

That is why you need to run regular data consistency check/repairs, to clean up bad blocks that are lurking and are results of media failures AFTER the last time you read that stripe of data.

Best practice is to always do replacements hot.   The most stressful thing you can do to disk is power cycle it.  Why rock the boat?

the HP person is correct.  There is also no reason to have to guess to discover what is going wrong.  Look at the event log in the controller. it will tell you.


Author Comment

ID: 34219574

I can't access the event log in the controller. Not until the server is fully functioning.

What is the method you recommend to do regular data consistency check/repairs?

Thanks for the help. I will settle down and trust the hot-swapping process.
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

LVL 47

Expert Comment

ID: 34219737
HP has a several freebies.  Certainly get latest smartstart / ACU / ADU utilities. I don't know off of my head what controller comes bundled with your system, but some of them have feature to automate the consistency repairs.

Go to the support.hp.com site, download/install latest firmware, updates, drivers, and get the add-on monitoring at the same site as well.
LVL 76

Expert Comment

by:Alan Hardisty
ID: 34219747
@dlethe - controller is as follows:

Integrated Smart Array 5i Plus Controller with optional Battery-Backed Write Cache (BBWC) Enabler option kit

LVL 56

Expert Comment

ID: 34219816
You should always replace drives on Smart Array controllers hot (you can remove cold but should add hot), but that doesn't explain your grey screen. What the HP tech advises you makes no sense to me either, as you say it should be running albeit slower, it ought to boot happily even with the bad drive removed.

Could you boot SmartStart Cd and run Array Diagnostic Utility and post log file **as attachment**, we might spot something wrong with one of the other disks.

You don't have an option to start a manual parity check, Smart Array controllers do this in background after 15 seconds of inactivity.

At a guess I'd say it's more than a flakey disk, chkdsk might show a file system corruption.

Author Comment

ID: 34220146
After the drive rebuilt, which took about 2 hours only (36 GB) I was able to restart the server and it booted fine. Now running chkdsk. I did a hotswap.

I looked through some of the details the HP Insight Online edition gave, but it showed no errors for this failure and no record of it where I was looking. Attached is the ADU from the machine after the drive has been replaced.

It did show one of the drives still in there had 6 "unavailable" errors whereas all the other drives had zero and one had one. Dunno if that is pre-emptive notice an issue, but none of the drives showed read errors, etc.

LVL 47

Expert Comment

ID: 34220323
Don't confuse XOR parity errors with file system errors.   They are independent of each other.  

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Restoring deleted objects in Active Directory has been a standard feature in Active Directory for many years, yet some admins may not know what is available.
Each year, investment in cloud platforms grows more than 20% (https://www.immun.io/hubfs/Immunio_2016/Content/Marketing/Cloud-Security-Report-2016.pdf?submissionGuid=a8d80a00-6fee-4b85-81db-a4e28f681762) as an increasing number of companies begin to…
This video teaches viewers how to encrypt an external drive that requires a password to read and edit the drive. All tasks are done in Disk Utility. Plug in the external drive you wish to encrypt: Make sure all previous data on the drive has been …
This Micro Tutorial will teach you how to reformat your flash drive. Sometimes your flash drive may have issues carrying files so this will completely restore it to manufacturing settings. Make sure to backup all files before reformatting. This w…
Suggested Courses

650 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question