Improve company productivity with a Business Account.Sign Up

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1985
  • Last Modified:

Proliant ML350 G4, Ultra320 Drives downshifted to Ultra 2 Narrow

Have an HP Proliant ML350 G4 that's giving me some hard drive grief. The server has 6x72.8GB 10K ultra320 SCSI drives.

It all starts with an Event on bootup from CPQCISSE, Event ID: 24683
SCSI bus fault occurred on Storage Box box 0, , Port 0 of
Array Controller  in slot 3.
This may result in a "downshift" in transfer rate for one or more hard drives on the bus.

When I look at my drive configuration, I see that drives 0, 1, 2, 3 have been downshifted to Ultra 2 Narrow.  Drive 4 is running at Ultra 320 Wide (!).  Drive 5 is running at Ultra 2 Wide.

I have updated all firmware on the system and hard drives, and pulled them all out and reseated the cables.

Looking at the diagnostics, I see that the Storage Enclosure and Drive Cage have a Critical saying U160 Disabled.

On the physical drive there is a warning flag about the drives being downshifted to Ultra 2.

I do see that drives 4 & 5 have a timeout as the last failed reason. I suspect the timeout on drive 4 is from when I pulled it out while the system was on earlier this month, not sure about the timeout on drive 5.

Any suggestions? Attached is the diagnostics report.


 report-c7380d42-00001f20-0000000.zip
0
gozoliet
Asked:
gozoliet
3 Solutions
 
SysExpertCommented:
I would run diagnostics on the drives and the controller, and consider doing a rebuild by swapping out 1 drive at a time, or a full backup and restore to a new set of drives. If the controller is bad, replace it.

I would also swap cables, and maybe even the backplane.

If under warranty, contact HP since the errors are a little weird, and hard to say where the exact cause is.



I hope this helps !
0
 
DavidPresidentCommented:
Noisy signals & poorly shielded cables along with connectors that are not tightened will do this.   At least give parallel SCSI credit, that instead of just dying, it degrades speed so that everything can still talk.  Since you have external enclosures, my advice is to make sure cables are as short as possible.  Not all cabling is suitable for U320 at full distance.  They are not created equally.  Updating firmware has nothing to do with this problem.  

You *could* run some software on a JBOD controller attached to the enclosure and add disks one at a time until reported speed drops from U320.   Then once you see which disk does it, move things around to see if the problem is specific to the disk or the bay.    You have to use a JBOD controller, and this is only way to do it, as you don't want to be messing with moving things around behind a RAID controller, as this would fool the controller into thinking you had drive failures, and would kick off a reboot.

Anyway, that is what I do when in similar situations.   (See http://www.santools.com/smart/unix/manual/linkspeedreporting.htm)
Software will set you back $100, but it will absolutely identify link speed issues.  If you have the cabling, swap everything, but if you want to diagnose & confirm you'll need some software.   Also, you could have issue where link speed just decreases a few seconds, minutes, or longer after it is hooked up.   To diagnose that with the software, just hook it all up, run a read-only benchmark of your choice (do not mount the disks, just do raw read I/O to physical devices), and check link speed every 10 minutes.  Something else may be going on.
0
 
andyalderCommented:
Never seen the count for "bus faults" being so high. There's no external SCSI connector on the controller so no chance anything's accidentally plugged in there, just the 6 internal drive bays. I'd take the lid off and reseat the cable, maybe it's damaged.
0
Get 10% Off Your First Squarespace Website

Ready to showcase your work, publish content or promote your business online? With Squarespace’s award-winning templates and 24/7 customer service, getting started is simple. Head to Squarespace.com and use offer code ‘EXPERTS’ to get 10% off your first purchase.

 
gozolietAuthor Commented:
Just some thoughts from me...
Andy - you're right about the error counts.. Lots of errors all over the place, which gives the downshifting some reason.  Unfortunately it's not like 1 drive has a higher amount of errors, it's pretty consistent across disks on the same port (except 4/5?? still a mystery).

dlethe - the storage bay is internal, they're hot-swappable (or hot-plug) hard drives on the front of an HP. There was a firmware release that corrected a "timeout" issue which caused the controller to downshift, but that's for 15k drives.. not mine..

I'll try to dig out another cable. Otherwise I don't *really* want to shell out some cash to try a new backplane on a whim.  The handful of threads on google have mentionned everything from mainboard to power supply so lots of part-swapping. ugh.

The suggestions are great.. we'll see if more come in or if I isolate better.
0
 
andyalderCommented:
At least you can systematically eliminate the disks if you have time to spin them up one at a time as dlethe suggested, you can always boot the smartstart CD and use the ADU there if you don't want to buy software. you can also eliminate the smart array controller since there's an onboard non-RAID SCSI chip on the motherboard. Make sure you number them. PSU sounds very possible because the non SCSI bus related errors are at silly levels as well.
0
 
gozolietAuthor Commented:
Just an update.. or a non-update. Have reseated everything and still experiencing problem.
Don't have a spare PSU to test with at this point in time. Not quite sure how I would go about testing the hard drives individually without disturbing the raid.. If I replug a drive onto the onboard controller and boot off something to run smartmon on that drive, can I replace the hard drive in the array after and everything works as is?
0
 
gozolietAuthor Commented:
Never really figured it out, still a problem, but your tips were good at helping me look further into it.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Get 10% Off Your First Squarespace Website

Ready to showcase your work, publish content or promote your business online? With Squarespace’s award-winning templates and 24/7 customer service, getting started is simple. Head to Squarespace.com and use offer code ‘EXPERTS’ to get 10% off your first purchase.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now