Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
Solved

Proliant ML350 G4, Ultra320 Drives downshifted to Ultra 2 Narrow

Posted on 2010-09-13
7
1,928 Views
Last Modified: 2012-05-10
Have an HP Proliant ML350 G4 that's giving me some hard drive grief. The server has 6x72.8GB 10K ultra320 SCSI drives.

It all starts with an Event on bootup from CPQCISSE, Event ID: 24683
SCSI bus fault occurred on Storage Box box 0, , Port 0 of
Array Controller  in slot 3.
This may result in a "downshift" in transfer rate for one or more hard drives on the bus.

When I look at my drive configuration, I see that drives 0, 1, 2, 3 have been downshifted to Ultra 2 Narrow.  Drive 4 is running at Ultra 320 Wide (!).  Drive 5 is running at Ultra 2 Wide.

I have updated all firmware on the system and hard drives, and pulled them all out and reseated the cables.

Looking at the diagnostics, I see that the Storage Enclosure and Drive Cage have a Critical saying U160 Disabled.

On the physical drive there is a warning flag about the drives being downshifted to Ultra 2.

I do see that drives 4 & 5 have a timeout as the last failed reason. I suspect the timeout on drive 4 is from when I pulled it out while the system was on earlier this month, not sure about the timeout on drive 5.

Any suggestions? Attached is the diagnostics report.


 report-c7380d42-00001f20-0000000.zip
0
Comment
Question by:gozoliet
7 Comments
 
LVL 63

Assisted Solution

by:SysExpert
SysExpert earned 166 total points
ID: 33663730
I would run diagnostics on the drives and the controller, and consider doing a rebuild by swapping out 1 drive at a time, or a full backup and restore to a new set of drives. If the controller is bad, replace it.

I would also swap cables, and maybe even the backplane.

If under warranty, contact HP since the errors are a little weird, and hard to say where the exact cause is.



I hope this helps !
0
 
LVL 47

Accepted Solution

by:
dlethe earned 167 total points
ID: 33664086
Noisy signals & poorly shielded cables along with connectors that are not tightened will do this.   At least give parallel SCSI credit, that instead of just dying, it degrades speed so that everything can still talk.  Since you have external enclosures, my advice is to make sure cables are as short as possible.  Not all cabling is suitable for U320 at full distance.  They are not created equally.  Updating firmware has nothing to do with this problem.  

You *could* run some software on a JBOD controller attached to the enclosure and add disks one at a time until reported speed drops from U320.   Then once you see which disk does it, move things around to see if the problem is specific to the disk or the bay.    You have to use a JBOD controller, and this is only way to do it, as you don't want to be messing with moving things around behind a RAID controller, as this would fool the controller into thinking you had drive failures, and would kick off a reboot.

Anyway, that is what I do when in similar situations.   (See http://www.santools.com/smart/unix/manual/linkspeedreporting.htm)
Software will set you back $100, but it will absolutely identify link speed issues.  If you have the cabling, swap everything, but if you want to diagnose & confirm you'll need some software.   Also, you could have issue where link speed just decreases a few seconds, minutes, or longer after it is hooked up.   To diagnose that with the software, just hook it all up, run a read-only benchmark of your choice (do not mount the disks, just do raw read I/O to physical devices), and check link speed every 10 minutes.  Something else may be going on.
0
 
LVL 55

Expert Comment

by:andyalder
ID: 33664785
Never seen the count for "bus faults" being so high. There's no external SCSI connector on the controller so no chance anything's accidentally plugged in there, just the 6 internal drive bays. I'd take the lid off and reseat the cable, maybe it's damaged.
0
Easy, flexible multimedia distribution & control

Coming soon!  Ideal for large-scale A/V applications, ATEN's VM3200 Modular Matrix Switch is an all-in-one solution that simplifies video wall integration. Easily customize display layouts to see what you want, how you want it in 4k.

 
LVL 4

Author Comment

by:gozoliet
ID: 33665028
Just some thoughts from me...
Andy - you're right about the error counts.. Lots of errors all over the place, which gives the downshifting some reason.  Unfortunately it's not like 1 drive has a higher amount of errors, it's pretty consistent across disks on the same port (except 4/5?? still a mystery).

dlethe - the storage bay is internal, they're hot-swappable (or hot-plug) hard drives on the front of an HP. There was a firmware release that corrected a "timeout" issue which caused the controller to downshift, but that's for 15k drives.. not mine..

I'll try to dig out another cable. Otherwise I don't *really* want to shell out some cash to try a new backplane on a whim.  The handful of threads on google have mentionned everything from mainboard to power supply so lots of part-swapping. ugh.

The suggestions are great.. we'll see if more come in or if I isolate better.
0
 
LVL 55

Assisted Solution

by:andyalder
andyalder earned 167 total points
ID: 33666610
At least you can systematically eliminate the disks if you have time to spin them up one at a time as dlethe suggested, you can always boot the smartstart CD and use the ADU there if you don't want to buy software. you can also eliminate the smart array controller since there's an onboard non-RAID SCSI chip on the motherboard. Make sure you number them. PSU sounds very possible because the non SCSI bus related errors are at silly levels as well.
0
 
LVL 4

Author Comment

by:gozoliet
ID: 33782397
Just an update.. or a non-update. Have reseated everything and still experiencing problem.
Don't have a spare PSU to test with at this point in time. Not quite sure how I would go about testing the hard drives individually without disturbing the raid.. If I replug a drive onto the onboard controller and boot off something to run smartmon on that drive, can I replace the hard drive in the array after and everything works as is?
0
 
LVL 4

Author Closing Comment

by:gozoliet
ID: 34213460
Never really figured it out, still a problem, but your tips were good at helping me look further into it.
0

Featured Post

Best Practices: Disaster Recovery Testing

Besides backup, any IT division should have a disaster recovery plan. You will find a few tips below relating to the development of such a plan and to what issues one should pay special attention in the course of backup planning.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

The article will include the best Data Recovery Tools along with their Features, Capabilities, and their Download Links. Hope you’ll enjoy it and will choose the one as required by you.
The business world is becoming increasingly integrated with tech. It’s not just for a select few anymore — but what about if you have a small business? It may be easier than you think to integrate technology into your small business, and it’s likely…
This tutorial will walk an individual through the process of installing the necessary services and then configuring a Windows Server 2012 system as an iSCSI target. To install the necessary roles, go to Server Manager, and select Add Roles and Featu…
This Micro Tutorial will teach you how to reformat your flash drive. Sometimes your flash drive may have issues carrying files so this will completely restore it to manufacturing settings. Make sure to backup all files before reformatting. This w…

837 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question