Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

Proliant ML350 G4, Ultra320 Drives downshifted to Ultra 2 Narrow

Posted on 2010-09-13
7
Medium Priority
?
1,973 Views
Last Modified: 2012-05-10
Have an HP Proliant ML350 G4 that's giving me some hard drive grief. The server has 6x72.8GB 10K ultra320 SCSI drives.

It all starts with an Event on bootup from CPQCISSE, Event ID: 24683
SCSI bus fault occurred on Storage Box box 0, , Port 0 of
Array Controller  in slot 3.
This may result in a "downshift" in transfer rate for one or more hard drives on the bus.

When I look at my drive configuration, I see that drives 0, 1, 2, 3 have been downshifted to Ultra 2 Narrow.  Drive 4 is running at Ultra 320 Wide (!).  Drive 5 is running at Ultra 2 Wide.

I have updated all firmware on the system and hard drives, and pulled them all out and reseated the cables.

Looking at the diagnostics, I see that the Storage Enclosure and Drive Cage have a Critical saying U160 Disabled.

On the physical drive there is a warning flag about the drives being downshifted to Ultra 2.

I do see that drives 4 & 5 have a timeout as the last failed reason. I suspect the timeout on drive 4 is from when I pulled it out while the system was on earlier this month, not sure about the timeout on drive 5.

Any suggestions? Attached is the diagnostics report.


 report-c7380d42-00001f20-0000000.zip
0
Comment
Question by:gozoliet
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
7 Comments
 
LVL 63

Assisted Solution

by:SysExpert
SysExpert earned 664 total points
ID: 33663730
I would run diagnostics on the drives and the controller, and consider doing a rebuild by swapping out 1 drive at a time, or a full backup and restore to a new set of drives. If the controller is bad, replace it.

I would also swap cables, and maybe even the backplane.

If under warranty, contact HP since the errors are a little weird, and hard to say where the exact cause is.



I hope this helps !
0
 
LVL 47

Accepted Solution

by:
David earned 668 total points
ID: 33664086
Noisy signals & poorly shielded cables along with connectors that are not tightened will do this.   At least give parallel SCSI credit, that instead of just dying, it degrades speed so that everything can still talk.  Since you have external enclosures, my advice is to make sure cables are as short as possible.  Not all cabling is suitable for U320 at full distance.  They are not created equally.  Updating firmware has nothing to do with this problem.  

You *could* run some software on a JBOD controller attached to the enclosure and add disks one at a time until reported speed drops from U320.   Then once you see which disk does it, move things around to see if the problem is specific to the disk or the bay.    You have to use a JBOD controller, and this is only way to do it, as you don't want to be messing with moving things around behind a RAID controller, as this would fool the controller into thinking you had drive failures, and would kick off a reboot.

Anyway, that is what I do when in similar situations.   (See http://www.santools.com/smart/unix/manual/linkspeedreporting.htm)
Software will set you back $100, but it will absolutely identify link speed issues.  If you have the cabling, swap everything, but if you want to diagnose & confirm you'll need some software.   Also, you could have issue where link speed just decreases a few seconds, minutes, or longer after it is hooked up.   To diagnose that with the software, just hook it all up, run a read-only benchmark of your choice (do not mount the disks, just do raw read I/O to physical devices), and check link speed every 10 minutes.  Something else may be going on.
0
 
LVL 56

Expert Comment

by:andyalder
ID: 33664785
Never seen the count for "bus faults" being so high. There's no external SCSI connector on the controller so no chance anything's accidentally plugged in there, just the 6 internal drive bays. I'd take the lid off and reseat the cable, maybe it's damaged.
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
LVL 4

Author Comment

by:gozoliet
ID: 33665028
Just some thoughts from me...
Andy - you're right about the error counts.. Lots of errors all over the place, which gives the downshifting some reason.  Unfortunately it's not like 1 drive has a higher amount of errors, it's pretty consistent across disks on the same port (except 4/5?? still a mystery).

dlethe - the storage bay is internal, they're hot-swappable (or hot-plug) hard drives on the front of an HP. There was a firmware release that corrected a "timeout" issue which caused the controller to downshift, but that's for 15k drives.. not mine..

I'll try to dig out another cable. Otherwise I don't *really* want to shell out some cash to try a new backplane on a whim.  The handful of threads on google have mentionned everything from mainboard to power supply so lots of part-swapping. ugh.

The suggestions are great.. we'll see if more come in or if I isolate better.
0
 
LVL 56

Assisted Solution

by:andyalder
andyalder earned 668 total points
ID: 33666610
At least you can systematically eliminate the disks if you have time to spin them up one at a time as dlethe suggested, you can always boot the smartstart CD and use the ADU there if you don't want to buy software. you can also eliminate the smart array controller since there's an onboard non-RAID SCSI chip on the motherboard. Make sure you number them. PSU sounds very possible because the non SCSI bus related errors are at silly levels as well.
0
 
LVL 4

Author Comment

by:gozoliet
ID: 33782397
Just an update.. or a non-update. Have reseated everything and still experiencing problem.
Don't have a spare PSU to test with at this point in time. Not quite sure how I would go about testing the hard drives individually without disturbing the raid.. If I replug a drive onto the onboard controller and boot off something to run smartmon on that drive, can I replace the hard drive in the array after and everything works as is?
0
 
LVL 4

Author Closing Comment

by:gozoliet
ID: 34213460
Never really figured it out, still a problem, but your tips were good at helping me look further into it.
0

Featured Post

Free learning courses: Active Directory Deep Dive

Get a firm grasp on your IT environment when you learn Active Directory best practices with Veeam! Watch all, or choose any amount, of this three-part webinar series to improve your skills. From the basics to virtualization and backup, we got you covered.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The article will include the best Data Recovery Tools along with their Features, Capabilities, and their Download Links. Hope you’ll enjoy it and will choose the one as required by you.
When we purchase storage, we typically are advertised storage of 500GB, 1TB, 2TB and so on. However, when you actually install it into your computer, your 500GB HDD will actually show up as 465GB. Why? It has to do with the way people and computers…
This Micro Tutorial will teach you how to reformat your flash drive. Sometimes your flash drive may have issues carrying files so this will completely restore it to manufacturing settings. Make sure to backup all files before reformatting. This w…
In this video, Percona Solutions Engineer Barrett Chambers discusses some of the basic syntax differences between MySQL and MongoDB. To learn more check out our webinar on MongoDB administration for MySQL DBA: https://www.percona.com/resources/we…

609 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question