Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 548
  • Last Modified:

Repeated scsi drive failure with windows 2003

Hey Guys,

I'm having problems with drives failing on reboots.  I'm using windows 2003 enterprise server, IBM Ultrastar 160 drives, adaptec 2010S raid controller, supermicro motherboard and backplane.

In the past two weeks, we've had 3 drive failures all on the reboots.   We are running Raid 5 config, so we haven't lost any data, and we have caught it every time.  Basically is there any known issues or compatibility issues with this hardware?

Thanks.....This is a EXTREMELY important question, so it's valued at 500 points.
0
johnstaggs
Asked:
johnstaggs
  • 10
  • 7
1 Solution
 
chicagoanCommented:
Did the drives really fail or are they failing to come ready?
I've seen 160's and 320's in hotswap cages not come ready from a cold start, but work fine if they're inserted into the cage after the system's power up and warm booted.
Sometimes you see  LUN=0 BUS=0 ID=0 Bad SCSI Status – Check Condition messages.
Setting the system bios to do a full post sometimes helps, especially if there is a lot of ram.
These drive can take up to 20 seconds to spin up.
Check the auto spin up jumper settings, though setting all the drives to auto spin can put a severe load on the PSU.

0
 
johnstaggsAuthor Commented:
Well, when the system boots, the drives has no lights on what so ever.  Then after Windows boots up, we will go into SMOR (Adaptecs software) to look at the config, and the drive with no lights on will be marked with a red drive, and say "failed".
0
 
johnstaggsAuthor Commented:
One thing we did notice with the Ultrastar drives....is some are differant models then the others.  Only 1 model that we have is on the Windows HCL list for Windows 2003 Server......Could this be a potential pitfall?
0
Lessons on Wi-Fi & Recommendations on KRACK

Simplicity and security can be a difficult  balance for any business to tackle. Join us on December 6th for a look at your company's biggest security gap. We will also address the most recent attack, "KRACK" and provide recommendations on how to secure your Wi-Fi network today!

 
chicagoanCommented:
Are these in a hotswap cage? Whose?

What have you done with the failed drives?
0
 
johnstaggsAuthor Commented:
Yes, SuperMicro.

We've reformatted them, and gave them another try, and they are able to be used again.
0
 
chicagoanCommented:
WHich lends credence to them being OK and just not ready.
I'd get with SuperMicro and see what their suggestions are about getting the drives initialized before the OS looks at the array.
0
 
johnstaggsAuthor Commented:
The drive is intializing when the system boots up, because you can see it when you post.  And also if you go to SMOR (bios) it will show the raid "degraded".   But maybe I'm not quite getting what your saying.
0
 
chicagoanCommented:
If the drives test out OK afterward, something's going on at boot time that makes them unavailable to the array.
I'd really be suspicious of the backplane/disk enclosure here, and I'd see what the manufacturerer has to say.
0
 
dbruntonCommented:
>> WHich lends credence to them being OK and just not ready.

I'll add a couple of comments to this statement.

I'd be looking at the SCSI cable and the host adapter in this case.  And possibly power supply, it may not be capable of supporting the power required for everything.
0
 
johnstaggsAuthor Commented:
Hey guys, I'm going to get ahold of supermicro today, so bare with me.   But the backplane is a good possibility....since I had already called adaptec and they said it could be the problem.  Another thing is, I had single drive on a dual xeon box (same exact type of setup), the machine did not have a raid controler in it (wasn't running raid, hense the single drive).  And it died on me in about 1 week.

Maybe that will lead to something else.  I'm in the process of setting up another machine, which it has dual backplanes on it, and i'm sure a differant type of raid controller.


But all suggestions are welcome, and I really appreciate the time you guys take to help me figure out the problem.
0
 
chicagoanCommented:
While a drive dying in another box, especially a non redundant drive, is a pain, I think it's just anecdotal.
You said the drives from the degraded array tested OK outside the system.
Unless these drives are all from one lot and you suspect a manufacturing defect, I'm liking the backplane/bus as the likely suspect.
0
 
johnstaggsAuthor Commented:
Indeed, I was able to format the failed drive, and put it back into the array, so that shows it's not really the drives.  All the drives being from the same lot is a good chance, there is two differant models of the drive that we have.  (drive specs are the same, just differant models).

So I should look into the backplane/bus issue correct?   And do you guys have any suggestions on how I could go about testing it?

(btw, that non redundant drive that i lost, was just on a test box i had setup, so it wasn't to important...thank god).   Right now I'm not setting any machine up, unless it's using raid 5, and has two hot spares.   You know, I had ran these drives quite awhile on a differant box, that had a older motherboard, never had a single problem...  Then I went to these new boxes that have a newer motherboard, and have had nothing but problems with the drives.

When i say older motherboard, i'm meaning months, not years or anything, but they are differant models.
0
 
chicagoanCommented:
That's one of those "zero-raid" setups through a dedicated PCI slot?
0
 
johnstaggsAuthor Commented:
yes, that is correct.
0
 
johnstaggsAuthor Commented:
I do have the latest 2003 drivers, but I'm going to have to check about the latest bios, give me a few min, and I'll update this and let you know.
0
 
johnstaggsAuthor Commented:
Bios shows I20 v.001.62 but the date doesn't match the date on the link.  So i'm going to do a update to both of these on a new machine
0
 
johnstaggsAuthor Commented:
I had the latest 2003 (tried reinstalling it).  The bios looked like the same version, but it was updated.  So both of those are done.

We've got another very similar machine, and we are setting it up with raid5 (it has a split backplane), and two hostspares.  We are going to run it for awhile and see if we run into any more problems.

I"m going to go ahead and award the points to you, but if you can think of anything else to try down the road, please reply.

Thanks
0

Featured Post

Enhanced Intelligibility Without Cable Clutter

Challenge: The ESA office in Brussels wanted a reliable audio conference system for video conferences. Their requirement - No participant must be left out from the conference and the audio quality must not be compromised.

  • 10
  • 7
Tackle projects and never again get stuck behind a technical roadblock.
Join Now