Link to home
Start Free TrialLog in
Avatar of dslntadmin
dslntadminFlag for Afghanistan

asked on

Servers keep failing RAID drives

We have two new servers with identical hardware.  Server 1 is having random drives in the RAID5 array fail (port 3 twice, port 2 once, port 1 once) every few days.  Server 2 is having the same drive on port 3 fail, 5 times now.  I use Intel Matrix Storage Manager to mark the drive as normal and it rebuilds after about a day and half.  Then another drive will fail within a few days.  

I have the vendor working on it and they're working with Intel, but so far no luck.  They have a new motherboard coming for the server1 that has random drives failing and we've tried replacing the constantly failing drive on Server2, but it still failed within a day.  I haven't yet tried to destroy the RAID volume and start from scratch.

Does anybody have any other ideas?  I really need these boxes to be stable before I can deploy them to our remote office.

Server Configurations:
*Intel S3420GP motherboards
*4 x 500GB Seagate Barracuda (ST3500418AS) hard drives running in a RAID5 configuration using on-board *Intel Matrix RAID
*I think the drive cage is model number R4E9Q4E4 C4E3R4E10.    
*Windows Server 2003 R2 with basic services (File, Printer, DFS, rsync).
Avatar of Justin Owens
Justin Owens
Flag of United States of America image

Is this a homemade server or one obtained from a Vendor?
Avatar of SnowWolf
SnowWolf

Have you installed the latest intel matrix driver?
ASKER CERTIFIED SOLUTION
Avatar of David
David
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of dslntadmin

ASKER

Yes, this is a vendor configured server and we are using the latest matrix drivers.  I'm starting to wish that I'd built it myself like the last two, they've been running flawlessly for two years now.
(Disty cost for those drives actually make them a little under $50.)

... So  you are trusting your business on an unreliable $200 worth of disk drives and a $10 chip.   Budget $1000 and get a controller with battery backup along with disks that have been qualified by the vendor for 365x24x7 use.

Methinks your vendor is trying to make too much of a profit.  Those parts are not designed for heavy, enterprise use.
Justin
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
andyalder has a good analysis.
If you want to low ball it, use software RAID.
Otherwise, pay the money and do it right.
Don't lowball with software RAID5.  Just google all the hits for data loss on software RAID5 on Win2K3 boot devices.   This works just fine until you have a drive failure, and some PCs will crash, others won't boot up.  Human intervention may be necessary.  Considering the investment that a company has on a server, it is just stupid to cut corners on the storage farm.  Do it right, or just outsource.
Agree with dlethe on not using RAID5 for the boot drive.

But a RAID1 (mirrored) boot drive with a RAID5 data drive should play nice - even if it's a little slow.

I have a Vostro 420 in exactly that configuration.
LOL, you're both joking right? Windows can't boot from software RAID 5 because it doesn't understand what RAID is until it's loaded ftdisk.sys.
If you consider Intel ICHR10 as a software RAID, then you can create a RAID5 boot disk.

I wouldn't recommend it, but you can do it.
We generally call that fake raid.
We will be returning the servers, thanks for all the help
Didn't really solve the issue, but confirmed what I was already beginning to suspect about the quality of the hardware.
It turns out it was the Intel Matrix Storage Console 8.9 that was causing issues.  They downgraded to 8.8 and it seems to have solved the issue, no problems so far.

http://communities.intel.com/thread/5036?start=0&tstart=0