failed harddrive hot swapped

Experts,

FS server was showing red light on disk activity so I ran the HP insight online diagnostic and the result was one failed harddrive. I hot swapped replaced it with an identical one yesterday.
Now server is verrrrrrrry slow, I can’t even open the array config or diagnostics.
i thought rebuilding the drive shouldn't take longer than few hours but this one is taking much longer.
please see photo attached; the 2nd drive on the left that is the new one. only the arrow symbol is flashing green but the actual disk is not steady green like the rest.

my question is; what is the best way to confirm that the drive is actually rebuilding and all is going smoothly?
thank you for your help
2015-04-01-09.52.56.jpg
russusAsked:
Who is Participating?
 
SnibborgOwnerCommented:
I'm wondering whether you have two RAID arrays.  The two 15k/320GB drives are mirrored whilst the other four drives are RAID5.
0
 
rindiCommented:
Rebuilds can take very long, it is dependent on the RAID type used, how the priorities for rebuilding were set, the size of the disk and amount of data, and also on the state of the other disks, for example if there are bad blocks etc.
0
 
russusAuthor Commented:
thanks rindi, this is a server i inherited but i do know for sure it is RAID5.
is there a way to confirm it is rebuilding? i need to know for sure before i shut it down completely and restart it
0
Network Scalability - Handle Complex Environments

Monitor your entire network from a single platform. Free 30 Day Trial Now!

 
rindiCommented:
You should see the status in the RAID utility. If you can't access it from within the OS, you would probably be able to see that when you reboot and use the function key to get into the RAID utility. Also, during a rebuild all the LED's of the member disks of your RAID array will be constantly active.
0
 
SnibborgOwnerCommented:
Rebuilding takes approximately 15-30 seconds per GB.  The System Management Home Page will tell you when the RAID is functioning correctly.  There will be a warning sign if it is not completed.  There will also be a entry in the log confirming when rebuilding has completed.

You might want to keep the following documents for future reference:

http://h20566.www2.hp.com/hpsc/doc/public/display?docId=emr_na-c00687518
http://www.brentozar.com/archive/2013/08/how-to-use-hp-system-management-homepage/
0
 
noxchoGlobal Support CoordinatorCommented:
If you replaced the drive then sure there is a rebuilding process going on. Otherwise there is no reason to slow down. Only if you did not put there a wrong drive of slower speed.
0
 
russusAuthor Commented:
thanks guys,
the thing is; it is now toooo slow to open anything, array utility, system management homepage, taking tooo long to load then not responding.
it has been a day like this. the replacement drive is identical. i attached an image of the drives, please have a look, is this a normal behaviour?
2015-04-01-09.52.56.jpg
0
 
russusAuthor Commented:
it is the 2nd drive from left that's been replaced. no steady green light
0
 
russusAuthor Commented:
attached is what i got when it finally responded to opening system management homepage
2015-04-01-12.07.29.jpg
0
 
SnibborgOwnerCommented:
That drive is not rebuilding.  Have a look on the SMHP and see if it is being recognised.
0
 
SnibborgOwnerCommented:
Has the RAID got a hot spare assigned?  If so that explains why the drive in bay 2 is offline.  Check the logs, you may find that the hot spare is building.
0
 
SnibborgOwnerCommented:
You can access a lot of the systems you need through the iLo.
0
 
russusAuthor Commented:
why is it too slow if it isn't rebuilding? disk management took 20 mins to open. an hour to open online diagnostics. anything i try to open is taking forever.
it is a mission to check the logs now :)

do you think it is ok to restart it? or shut it down and start it up?
0
 
SnibborgOwnerCommented:
Have a look at the setting through the iLo first.  You need to understand how the RAID is configured before you do anything drastic.  Also, going through the iLo will be a lot faster as it doesn't touch the operating system.  

You need to see if there is a hot spare configured on the server.  If that is the case then that's your problem, it will have proceeded to rebuild automatically.  If that has completed you will see it in the logs and there will be no warnings saying that the disk is degraded.  At that point, and if the drive is still not illuminated you will be able to pop it out.

Odds on, if you try a reboot in the state it is in now, at best it will be in the same state when it comes back up.  If you haven't got a hot spare fitted and the drive you've replaced is still dead then it's either a dead drive or a problem with the blackplane or, and this is quite likely, you need to re-assign that drive as a new hot spare.
0
 
andyalderCommented:
Have any of you guys actually looked at the photo? The server's internal health LED is on and that has nothing to do with the RAID arrays. Also the first and 2nd disks are different size to the other ones so most likely RAID 1 rather than RAID 5 so rebuild should only take an hour or so (except that something else is broken apart from the disk subsystem). You need to take the lid off and look at the internal LEDs on the mobo, you need it running to do that so hopefully it slides out OK.
0
 
SnibborgOwnerCommented:
The photo shows two 320 drives, the rest are 300's  So long as the drives are larger than the original they will work with the array. The down side is that they will only use 300 of the drives 320Gb of available space.
0
 
SnibborgOwnerCommented:
What may be more important than the size is the spin speed of the drives - all are 10k except the two 320Gb which are 15k.  There are two different generation of disks which are probably incompatible.  

Well spotted, but it's not the size.
0
 
russusAuthor Commented:
2 logicals. the first 2 x 72.8GB drives are the C:/drive  the 4 x 300GB are the D:/drive ( data files)
ilo needed a firware update so finally in but it is still irresponsive too slow. what logs am i after?
0
 
andyalderCommented:
It's not a disk problem, it's an internal health problem - maybe one of the DIMM LEDs is lit. Servers don't hang when rebuilding nor do they put the health LED on.

Also those drives are not 320GB, they're 72.8GB, you're reading the picture wrong.
0
 
russusAuthor Commented:
it is possible.
andy, the diagnostic i did showed physical drive bay 2 has failed. what do you think the issue could be internally?
0
 
andyalderCommented:
To the right of the disks is the internal health LED (square box with wavy line inside) lit red?
That LED has nothing to do with the disks but everything to do with the server being unresponsive, crashing or not powering on.
0
 
SnibborgOwnerCommented:
Sounds like you could well have a backplane or enclosure fault.  I don't suppose you have an old server knocking around that you can canibalise?  I'd bet on an enclosure fault, in which case You'd have to shut it down and replace it.
0
 
russusAuthor Commented:
if that is the case then i am in trouble.
ok saw some logs, it showed overheated and shutdown due to overheat about 20 days ago. airconditioning was down. but didnt notice any red light activity then.

i just pulled the disk out because we can't do anything with it like that. speed hasn't improved and array utility doesnt show the controllers
0
 
SnibborgOwnerCommented:
I'm afraid you are the point where whatever you do is going to take time.  

Before anything else, have you tried re-seating the drive?  Maybe a poor connection has tripped the error.

Unfortunately I don't know what environment you have.  Is this a legacy machine that can be P2V'd onto a virtual installation?

If you don't have any equipment then ring around the second user suppliers to locate a replacement server chassis.  You can then get the drives out of the faulty server and drop them into the replacement.  I would suggest that there are plenty of used Proliants about.

The final fallback is to purchase a server from eBay.  I went through a similar problem with a server and solved it that way.  IBM servers were also harder to come by.
0
 
SnibborgOwnerCommented:
I would suggest buying a complete chassis rather than an individual backplane or enclosure.  At least you'd be able to try a straight swap with the drives rather than trying to strip the one you have.  Also the price wouldn't be that different.
0
 
andyalderCommented:
Is the internal health LED on? It seems to be on from the photo.

If so then remove the lid and look at LEDs on mobo, there is a sticky label on inside of lid with LED locations and meanings on it. You need to keep it powered on or the LEDs will go out (obviously).
0
 
andyalderCommented:
Dude, please answer this...
To the right of the disks [in your photo] is the internal health LED (square box with wavy line inside) lit red?
It looks to be red in the photo but photos can be deceptive.
0
 
russusAuthor Commented:
sorry andy, busy making another server a secondary DC coz as dumb as it sounds, the issue is on my only DC.
yes it is red.
0
 
andyalderCommented:
Need to look at the motherboard LEDs then, with luck one of the DIMM LEDs is lit.

The disk problem is a red herring, you could smash the backplane up with a hammer it would not light this internal health LED. That only comes on for problems the motherboard can see directly such as DIMMs, CPUs and fans.

The maintenance and service guide is HERE Pages 41 and 42 list the motherboard diagnostic LEDs.
0
 
russusAuthor Commented:
fantastic thank you. will open it up as soon as secondary DC is ready
0
 
SnibborgOwnerCommented:
Ah, is this the main DC russus?  more importantly, is it the first DC?  If it is, then I would recommend that you move the FISMO roles.  If you are not sure if it is the first server then you'll need to check.

Instructions are here on how to check and move them:

https://support.microsoft.com/en-us/kb/324801?wa=wsignin1.0
0
 
SnibborgOwnerCommented:
The instructions for 2008 server:

http://support.microsoft.com/en-us/kb/223346
0
 
russusAuthor Commented:
basically we have domain.local and its child london.domain.local
that DC is the only one for domain.local.
latest;
looked at the LEDs inside accordning to page 42, all are correct. secondary DC is now up and running. i did shut down the main DC completely and pulled its plugs. no sign of life. virtually powered it on via iLo. now all LEDs are green. the drive is offline. no physical drive in bay 2. checkin controllers now
0
 
russusAuthor Commented:
checked the controllers. parallel Array A had a red x then a red emergency cross on the logical drive.
inserted the new drive in bay 2 and now all drives are dancing amber. the drive is now rebuilding.

rindi, andyalder and snibborg I thank you very much for your help.
0
 
russusAuthor Commented:
great help from the experts thank you
0
 
SnibborgOwnerCommented:
Great news.  Hope you don't have to work too late.
0
 
andyalderCommented:
Perhaps a power surge put the internal health LED on or maybe it'll come on again.
0
 
russusAuthor Commented:
nah not too late it's only 8pm here in London.
it is a possibility andy, it's an 8 year old server but all our servers are power backedup for any power cut. anyway upgrading soon
gnt fellas
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.