Solved

Replacement drive for failed drive in RAID-5 - shutdown and replace or hot-swap replace? - HP DL 380 G3

Posted on 2010-11-26
9
1,098 Views
Last Modified: 2012-05-10
This morning I discovered that our Windows 2003 Server (and Active Directory controller) had a failed drive. There are 5 36.8 GB drives configured in a RAID-5. I have a replacement drive coming in the next four hours.

Windows boots until the login screen, then it sits there at a gray screen and no users can login on their workstations or access the Exchange server. Even though 1-drive failing should not affect the server except to slow it down, it seems to and I'm concerned.

The HP tech recommend I replace the drive while the machine is on. Then wait for it to rebuild (~9 hours) and then see if it starts working.

This doesn't make sense to me. It makes more sense to me to hard shutdown the server, Replace the drive and then boot and let the rebuild process go on while the Windows services don't try to function off the failed drive.

It also makes me think that the I will get my AD controller functioning sooner this way.

This morning I restarted the server remotely with the ILO controller with a hard reset (not knowing a drive failed). In the first few minutes of the boot process the AD and file sharing services were working, then stopped and then I couldn't log in via the login window.

So, my guess is that the drive is reporting partial failure to the array controller and continuing trying to function, but in reality it is totally not working and the system would function better if the drive was out of the equation.

Following my logic or am I nuts and ignorant? :)

Thanks.

P.S. In another worst case scenario, another drive has failed and it isn't being shown on the drive lights. I can't access the Insight Diagnostics remotely.
0
Comment
Question by:sweetseater
  • 3
  • 2
  • 2
  • +1
9 Comments
 
LVL 76

Accepted Solution

by:
Alan Hardisty earned 125 total points
ID: 34219470
The HP DL380 G3 has hot-swap drives, so you can happily replace a drive with it still running.

As for the grey screen - sounds a little peculiar, but might be disk issues that get resolved once the replacement drive has been installed.
0
 
LVL 47

Assisted Solution

by:dlethe
dlethe earned 125 total points
ID: 34219473
Your premise is a bit incomplete.  The XOR parity in RAID5 protects against both full drive failure, and block failure.  If any surviving disks have a bad block, then you lose the entire stripe, so you can still have data loss.

That is why you need to run regular data consistency check/repairs, to clean up bad blocks that are lurking and are results of media failures AFTER the last time you read that stripe of data.

Best practice is to always do replacements hot.   The most stressful thing you can do to disk is power cycle it.  Why rock the boat?

the HP person is correct.  There is also no reason to have to guess to discover what is going wrong.  Look at the event log in the controller. it will tell you.

0
 

Author Comment

by:sweetseater
ID: 34219574
Dlethe,

I can't access the event log in the controller. Not until the server is fully functioning.

What is the method you recommend to do regular data consistency check/repairs?

Thanks for the help. I will settle down and trust the hot-swapping process.
0
 
LVL 47

Expert Comment

by:dlethe
ID: 34219737
HP has a several freebies.  Certainly get latest smartstart / ACU / ADU utilities. I don't know off of my head what controller comes bundled with your system, but some of them have feature to automate the consistency repairs.

Go to the support.hp.com site, download/install latest firmware, updates, drivers, and get the add-on monitoring at the same site as well.
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 76

Expert Comment

by:Alan Hardisty
ID: 34219747
@dlethe - controller is as follows:

Integrated Smart Array 5i Plus Controller with optional Battery-Backed Write Cache (BBWC) Enabler option kit

http://h18000.www1.hp.com/products/quickspecs/11473_div/11473_div.HTML
0
 
LVL 55

Expert Comment

by:andyalder
ID: 34219816
You should always replace drives on Smart Array controllers hot (you can remove cold but should add hot), but that doesn't explain your grey screen. What the HP tech advises you makes no sense to me either, as you say it should be running albeit slower, it ought to boot happily even with the bad drive removed.

Could you boot SmartStart Cd and run Array Diagnostic Utility and post log file **as attachment**, we might spot something wrong with one of the other disks.

You don't have an option to start a manual parity check, Smart Array controllers do this in background after 15 seconds of inactivity.

At a guess I'd say it's more than a flakey disk, chkdsk might show a file system corruption.
0
 

Author Comment

by:sweetseater
ID: 34220146
After the drive rebuilt, which took about 2 hours only (36 GB) I was able to restart the server and it booted fine. Now running chkdsk. I did a hotswap.

I looked through some of the details the HP Insight Online edition gave, but it showed no errors for this failure and no record of it where I was looking. Attached is the ADU from the machine after the drive has been replaced.

It did show one of the drives still in there had 6 "unavailable" errors whereas all the other drives had zero and one had one. Dunno if that is pre-emptive notice an issue, but none of the drives showed read errors, etc.

 Prolaw---Array-Diagnostic-Report.zip
0
 
LVL 47

Expert Comment

by:dlethe
ID: 34220323
Don't confuse XOR parity errors with file system errors.   They are independent of each other.  
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Every server (virtual or physical) needs a console: and the console can be provided through hardware directly connected, software for remote connections, local connections, through a KVM, etc. This document explains the different types of consol…
Data center, now-a-days, is referred as the home of all the advanced technologies. In-fact, most of the businesses are now establishing their entire organizational structure around the IT capabilities.
This video teaches viewers how to encrypt an external drive that requires a password to read and edit the drive. All tasks are done in Disk Utility. Plug in the external drive you wish to encrypt: Make sure all previous data on the drive has been …
This Micro Tutorial will teach you how to reformat your flash drive. Sometimes your flash drive may have issues carrying files so this will completely restore it to manufacturing settings. Make sure to backup all files before reformatting. This w…

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now