Improve company productivity with a Business Account.Sign Up

  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 547
  • Last Modified:

Highpoint RAID 404 Crashes During Verify

Hey guys (and some gals),

We've got a serious problem. Our file server has been crashing while verifying its array. It doesn't seem to be inconsistent, or lose any data. It just crashes almost every time - three times today!

The server used to kick disks out randomly during verification, about once every two or three weeks, but that all but stopped when I upgraded from 2-disk RAID 1 to 4-disk RAID 1/0. After the upgrade, the RAID controller didn't kick a disk, or crash while verifying, for about six months.

We moved to a new office, and the file server got physically relocated three or four times while we were settling in and getting our floor plan hashed out. I think maybe the bumps and dings started this round of crash problems, but all the cables and cards are seated properly and firmly. We use all Highpoint ATA cables and an X-Connect power supply.

Since the move, the computer crashes about 85% of the time when verifying the array, usually about 30m-1hr into the verification process. It hasn't really been kicking the disks out except for one time three weeks ago when it kicked out !three of the four! disks. Now that was a fiasco. Luckily we have a robust backup policy and things were recovered (relatively) smoothly.

It used to give me video card errors, so I replaced the video card with something older, less adventurous, and presumably more stable. Now the errors don't refer to a video card driver anymore, they just say "The system has recovered from a serious error".

I don't believe it's a processor, motherboard, or memory problem because we have never had a crash that I am aware of that wasn't during the verification process. It's not the disk drives - even the ones that get kicked out are always fine. No clicks, no excess heat, no SMART errors. All four disks have been cycled through the system over the last month because of this problem, trying to see if a specific disk was responsible.

Our system is:
P4 2.8
1 gig Crucial
Intel D865PERLK
Highpoint RocketRAID 404 Controller
4x 200 Seagate ATA
WinXP Pro SP2 fully updated
2x 200mm Antec Quiet Fans (given to show that we do have adequate cooling)
Zalman Copper 92mm CPU fan (" ")

AVG Antivirus
AllSync Scheduler
PowerClock Server
BOINC - Seti@Home
HPT Service Manager
Therapist Helper Server
WinAmp (waiting room music)
Highpoint RAID Management Console

I think the problem might be related to the fact that the HPT cards offset the XOR routine to the processor. BOINC runs Seti@Home while the system is otherwise inactive, so I wonder if the XOR offset might not collide with the SETI processes, but this seems a little out there.

Can anyone suggest a method to properly troubleshoot what actually causes the crash, how to stop the crashing, or, as a last resort, a known good and stable ATA RAID controller with eight channels and a reasonable price?

I can't seem to find ANY reviews of RAID controllers that have a review period of longer (DAMN COMPUTER! Just crashed again right now while verifying) than a week, and we all know that a week or two is nowhere NEAR long enough to assess the capabilities of a RAID card for long-term reliability. It's like assessing a new car model for reliability by glancing at the interior in a magazine spread.

So I guess this is a multi-pronged request - troubleshoot, fix, or suggest a replacement that is compatible and known good.

  • 5
  • 3
  • 2
2 Solutions
slbriggsphdAuthor Commented:
Just checked all the components - no excess heat on the CPU, northbridge, memory, video card, disk drives, or RAID controller. Nothing is more than slightly warm to the touch.
Clearly, the controller or the motherboard is going.  You should not get crashes trying to verify the array.

1.  First is to look for a BIOS update for the raid controller, of course from the MFGs website.

2.  If installing the BIOS does not fix the problem, go over the RAID settings once again, as I am sure you did.

3.  If you are determined to keep this controller, move it and the drives to a different motherboard.  That might solve the problem right there, the IRQ line on the MB might be unstable.

4.  If that does not work, then suspect the controller card.  Of course, you know you will have to backup all the data.  The best are promise RAID controllers, or Highpoint 370, both very reliable.

5.  Problem is, everyone is going now to SATA raid controllers, so if you are looking for a future solution, you are stuck with SATA, which means a whole new drive array, and this is expensive.
I haven't had all that good experiences with highpoint raid controllers. I find the promise raid cards are much more reliable, or even better are the 3ware ones, but highpoint seems to justify it's cheapness with low quality. If a firmware update as suggested by scrathy doesn't help, get more reliable raid cards.
NEW Internet Security Report Now Available!

WatchGuard’s Threat Lab is a group of dedicated threat researchers committed to helping you stay ahead of the bad guys by providing in-depth analysis of the top security threats to your network.  Check out this quarters report on the threats that shook the industry in Q4 2017.

slbriggsphdAuthor Commented:
Okay, I updated the RAID drivers in Windows; could not update the controller BIOS - long story; and updated the MB software package & bios.

Something interesting happened after I updated the RAID drivers. The verification failed, but instead of immediately rebooting like usual, the computer stayed on-line and gave me an error message that the second channel had failed. This is the channel I had been watching and suspected was a problem. This in hand, I moved the disk on that channel to the first channel as the slave. It has been stable since, and will verify.

However, I can't run both disks of the mirror off the 1st channel, it halves the performance. I think the card is out of warranty, Highpoint won't do warranty work on cards bought from resellers, and NewEgg doesn't sell the card anymore.

What kind of problems am I going to have moving the array onto a new controller? I'll basically have to make disk images and start from there, won't I? Is there any chance a new card, even a Highpoint card, will recognize my existing array? I don't think so... but... well, comments anyone?
The only way another RAID controller will recognize the existing array is if it is the SAME chipset on the controller card, and the same version, in which case you just plug the array and hope it works.  Usually this only works for mirror RAID 1 anyway.  I think it is safe to assume that if you want to move beyond the existing controller and its problems, you will have to wipe the array and start again.

But this is easier than you think.  Just install a good old IDE drive, copy all the data in the array to the IDE, and make sure the disk is bootable.  Remove the RAID from the system, boot from the CD, and make the IDe drive bootable from running fixboot C:  from the windows XP boot CD in recovery console.

Once you know the system can boot from this IDE, then it does not matter what happens to the RAID, get a new controller, reinitialize it, and copy all the data back -- but at least use RAID 1 or RAID 10 so that you have a mirror in the future, raid 0 and raid 5 are very prone to failures on removal of a drive.
slbriggsphdAuthor Commented:
Yeah, we're using RAID 10 currently. Its been very robust until this current problem.

A worse problem developed last night after I left - the stripe of the mirrors broke. Until now it's been one of the mirrors that breaks, which can be easily rebuilt with a spare drive. But the stripe broke somehow, reducing the problem to the same as a broken RAID 0 - I don't know of a way to recover this! From what I understand, RAID 0 breaks are unrecoverable in most situations.

I put in two spares, booted, and the controller didn't recognize the spares as useful - at least, there was no option to rebuild. I wouldn't really expect one, having it reduced to a RAID 0 situation anyway. It looks, at least intellectually, that a more robust system would be a mirror of two stripes, as opposed to a stripe of two mirrors...

Well, good thing we backup every night. Too bad its just the database, and not the OS and entire system, though.

Yeesh. Looks like I got some work to do. I'll get back here with my resolution for posterity and points.
There is a software you can use to recover a broken raid 0, but a restore from a backup is usually the real way to go. I strongly recommend you change the raid controller now.

raid reconstructor:
slbriggsphdAuthor Commented:

The array is a 1/0, which is a stripe of mirrors. When I say we're "reduced to a RAID 0 situation" I just mean that  the stripe between them broke. So we have two mirrors that are no longer striped, each mirroring only half the data.

Does anyone know of a tool to rebuild a RAID 10 array? The Raid Rebuilder from GetDataBack is only for RAID 0 and RAID 5. Highpoint was supposed to email me a tool, but I haven't seen it yet, and it was supposed to be here a few hours ago.
Check the software out, it can rebuild a broken raid0 and therefore also a broken raid10.
slbriggsphdAuthor Commented:
I pulled a drive from each mirror and am running the Raid Rebuilder on them now to an image on an external HD. Let's see how this goes.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Building an Effective Phishing Protection Program

Join Director of Product Management Todd OBoyle on April 26th as he covers the key elements of a phishing protection program. Whether you’re an old hat at phishing education or considering starting a program -- we'll discuss critical components that should be in any program.

  • 5
  • 3
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now