[Webinar] Streamline your web hosting managementRegister Today

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1981
  • Last Modified:

Proliant ML350 G4, Ultra320 Drives downshifted to Ultra 2 Narrow

Have an HP Proliant ML350 G4 that's giving me some hard drive grief. The server has 6x72.8GB 10K ultra320 SCSI drives.

It all starts with an Event on bootup from CPQCISSE, Event ID: 24683
SCSI bus fault occurred on Storage Box box 0, , Port 0 of
Array Controller  in slot 3.
This may result in a "downshift" in transfer rate for one or more hard drives on the bus.

When I look at my drive configuration, I see that drives 0, 1, 2, 3 have been downshifted to Ultra 2 Narrow.  Drive 4 is running at Ultra 320 Wide (!).  Drive 5 is running at Ultra 2 Wide.

I have updated all firmware on the system and hard drives, and pulled them all out and reseated the cables.

Looking at the diagnostics, I see that the Storage Enclosure and Drive Cage have a Critical saying U160 Disabled.

On the physical drive there is a warning flag about the drives being downshifted to Ultra 2.

I do see that drives 4 & 5 have a timeout as the last failed reason. I suspect the timeout on drive 4 is from when I pulled it out while the system was on earlier this month, not sure about the timeout on drive 5.

Any suggestions? Attached is the diagnostics report.


 report-c7380d42-00001f20-0000000.zip
0
gozoliet
Asked:
gozoliet
3 Solutions
 
SysExpertCommented:
I would run diagnostics on the drives and the controller, and consider doing a rebuild by swapping out 1 drive at a time, or a full backup and restore to a new set of drives. If the controller is bad, replace it.

I would also swap cables, and maybe even the backplane.

If under warranty, contact HP since the errors are a little weird, and hard to say where the exact cause is.



I hope this helps !
0
 
DavidPresidentCommented:
Noisy signals & poorly shielded cables along with connectors that are not tightened will do this.   At least give parallel SCSI credit, that instead of just dying, it degrades speed so that everything can still talk.  Since you have external enclosures, my advice is to make sure cables are as short as possible.  Not all cabling is suitable for U320 at full distance.  They are not created equally.  Updating firmware has nothing to do with this problem.  

You *could* run some software on a JBOD controller attached to the enclosure and add disks one at a time until reported speed drops from U320.   Then once you see which disk does it, move things around to see if the problem is specific to the disk or the bay.    You have to use a JBOD controller, and this is only way to do it, as you don't want to be messing with moving things around behind a RAID controller, as this would fool the controller into thinking you had drive failures, and would kick off a reboot.

Anyway, that is what I do when in similar situations.   (See http://www.santools.com/smart/unix/manual/linkspeedreporting.htm)
Software will set you back $100, but it will absolutely identify link speed issues.  If you have the cabling, swap everything, but if you want to diagnose & confirm you'll need some software.   Also, you could have issue where link speed just decreases a few seconds, minutes, or longer after it is hooked up.   To diagnose that with the software, just hook it all up, run a read-only benchmark of your choice (do not mount the disks, just do raw read I/O to physical devices), and check link speed every 10 minutes.  Something else may be going on.
0
 
Handy HolderSaggar maker's bottom knockerCommented:
Never seen the count for "bus faults" being so high. There's no external SCSI connector on the controller so no chance anything's accidentally plugged in there, just the 6 internal drive bays. I'd take the lid off and reseat the cable, maybe it's damaged.
0
Upgrade your Question Security!

Your question, your audience. Choose who sees your identity—and your question—with question security.

 
gozolietAuthor Commented:
Just some thoughts from me...
Andy - you're right about the error counts.. Lots of errors all over the place, which gives the downshifting some reason.  Unfortunately it's not like 1 drive has a higher amount of errors, it's pretty consistent across disks on the same port (except 4/5?? still a mystery).

dlethe - the storage bay is internal, they're hot-swappable (or hot-plug) hard drives on the front of an HP. There was a firmware release that corrected a "timeout" issue which caused the controller to downshift, but that's for 15k drives.. not mine..

I'll try to dig out another cable. Otherwise I don't *really* want to shell out some cash to try a new backplane on a whim.  The handful of threads on google have mentionned everything from mainboard to power supply so lots of part-swapping. ugh.

The suggestions are great.. we'll see if more come in or if I isolate better.
0
 
Handy HolderSaggar maker's bottom knockerCommented:
At least you can systematically eliminate the disks if you have time to spin them up one at a time as dlethe suggested, you can always boot the smartstart CD and use the ADU there if you don't want to buy software. you can also eliminate the smart array controller since there's an onboard non-RAID SCSI chip on the motherboard. Make sure you number them. PSU sounds very possible because the non SCSI bus related errors are at silly levels as well.
0
 
gozolietAuthor Commented:
Just an update.. or a non-update. Have reseated everything and still experiencing problem.
Don't have a spare PSU to test with at this point in time. Not quite sure how I would go about testing the hard drives individually without disturbing the raid.. If I replug a drive onto the onboard controller and boot off something to run smartmon on that drive, can I replace the hard drive in the array after and everything works as is?
0
 
gozolietAuthor Commented:
Never really figured it out, still a problem, but your tips were good at helping me look further into it.
0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now