Link to home
Start Free TrialLog in
Avatar of CATHY-IT
CATHY-ITFlag for Canada

asked on

Does Internal SAS Enclosure Device Failure (Bay1,Box1, Port I1, Slot 0) Mean drive failure

Good Morning Experts
Have the error in question in the HP Proliant Integrated Management Log and i need help to diagnose and repair properly. I've noticed out of the four 146GB Serial SCSI drives that one now has no LED lit at all. and the other three the green leds are flashing continues in sync. Is it a matter of just getting a new one and swapping it out. I have not had the experience of working with Raid before, I believe we have a Raid 5 Setup since at this moment we have not experience any data loss.. Do I just go into the HP Array Configuration Utility to see what is going on? I hesitate only due to lack of experience and not want to interrup the server at this time. Any help on the effective way to got about this is appreciated. thanks
Avatar of TG Tran
TG Tran
Flag of United States of America image

You can view the drive status via HP Array Diagnostic Utility (ADU) - and Yes, the unlit drive is either a hot spare or dead.  You can call HP if you have an active contract and ask them to send a replacement.  It is just a matter of swapping the drive, no biggie
if this is indeed a RAID-5 setup then you are correct in the fact of swapping out the drive.
I would power it down first though shice you are not certain of its "hot-swappable-ness".
The array should rebuild once it powers back up.
Avatar of David
Best practice is to ..
1) KIck off full backup NOW
2) Order 2 replacement drives, not one, as you should have a hot spare on site anyway.

Remember at this point you can still have data loss. If a drive fails, ALL data is gone forever so you better have a current backup.
Even a bad block on any of the surviving disks means unrecoverable data loss.  You very well may have a bad block, especially if your RAID configuration isn't set up to automatically do XOR/parity check/rebuilds.
Avatar of CATHY-IT

ASKER

How can I confirm what Raid setup we have? and Power down when now or when I go replace the drive? I need to order the replacement.
I was in the ADU and from what I can understand of the long overly informative Report is that Port i1 states that the Last Failure Reason - Marked Bad Failed..but right under that is the another section.. label Device Flags..stated the Drive is present and operational. so I'm confused

 just an FYI and i'm not sure if relvent but the drivers are in the HP Server from left to right labelled 1 to 8 slots and the physical drive that has no LEDs lit is the one in number four.. middle. I just found it strange that its labell Port 1l.
SOLUTION
Avatar of David
David
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
A disk can be marked bad if it exhibits a variety of problems.  Example it encountered X bad blocks within Y minutes.  In that case, the disk is still "good" and operational, but the algorithm in the firmware decided using that disk puts data at risk, so it kicks it out of the RAID set.
I have contacted our Local Supplier and he's quite more familar with our system since he's been working with us long before I started here and he's helping me to determine exacatly what is happening, at this moment our Utility software is not even working for the HP Management page and he's downloading updates.. I'll get back to you once we figure out what's going on.
I see a red flag here, and suggest you ask EXACTLY what he is doing, and then quietly go to support.hp.com, and look up the firmware, drivers, BIOS, release notes, whatever is relevant to what he is changing.

Reason I say this, is that it is incredibly bad form to update ANYTHING while a system is in degraded mode. (Exception is ACU), but drivers, BIOS/firmware is generally not even supported or tested.  These things should only be changed while system is in optimal mode.  Quietly read release notes.  One does not rock the boat when there is no room for error.  If he is patching something and making an attempt to force the disk online, then he *MAY* be more interested in not purchasing a replacement  disk for you.

The disk went offline for a reason.  Get the event log and see why.  As the great one, Ronald Reagan said, "Trust ... but verify".
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
hello..sorry for the delay its been crazy here and i only part-time. AndyAdler it appears you are correct. the ACU is showing three Logical drives and one spare, after we got the software utiltiies working. I'm going to keep an eye on it and have already inquired on the cost and availablity of spares. Also the Tech I was working with suggest based on some research he did about the error to update our HP Raid Controller's Firmware to resolve an issue with false errors. I'll be looking into that  as well. Thank you for all your input.
Appreciate all the input for all Experts, found it a bit hard to know where to put the points, don't want of offend anyone.