Adaptec 2120S RAID Degraded State

I have an older server with a Adaptec 2120S RAID controller, configuration is (was) RAID 5 with 4 drives and a single hot-spare drive.

This morning it was showing two failed drives and is now running on 3 drives. We replaced the two failed drives, initialized them and set them as dedicated hot-spare drives. My problem is it will not rebuild the array and it still shows degraded. I have tried removing the hot-spare assignment and reapplying it and it still will not enter the rebuild state.

Any advice would be appreciated as we need to get 6 more months out of this server before replacement.

Thanks for your help.
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

SHBSystems ArchitectCommented:
RAID 5 cannot withstand 2 drive failures. Do you have a backup?
Try adding 1 drive at at a time .

Designate the replacement drive as a hot spare. This can be done at POST by entering the ROM-based setup via Ctl-A. Once there, choose the Manage Arrays menu, and press Ctl-S to take you into the Spares setup. Select the drive with your arrow keys, and press the Insert key to designate the spare and Enter to save it. You will receive a prompt asking whether you are finished (which is presumably true, so choose "yes"). The array rebuild will then start automatically.

Check whether the rebuild is initated and successfull.Then insert the 4th drive

rbrotherAuthor Commented:
I fully understand it cannot sustain another drive failure. I did set the drives up as hot spares, but the array will not start the will not rebuild.
The 7 Worst Nightmares of a Sysadmin

Fear not! To defend your business’ IT systems we’re going to shine a light on the seven most sinister terrors that haunt sysadmins. That way you can be sure there’s nothing in your stack waiting to go bump in the night.

Did you set the disks to "failed" before you removed them?
According to adaptec, you will need to replace the disks, one at a time, in the order that they failed.

BTW, is copyback enabled?
rbrotherAuthor Commented:
9960kel: Thank you for your response. I did not set the disks as failed before replacing them. Do you think I should remove the new disks and start over? I still have the failed drives but do not know which slots they were in. Also I have the Adaptec storage manager installed and am working from the windows desktop.
For starters, here is a link to the Adaptec reference for your card.

Starting over might get you to the goal faster.

Only after removing the bad disks, simply install and initialize one of the disks, and make it available to the array. (this should allow the disk to participate, and the re-build task should start)

Wait for the re-build task to finish, and then add your new hot spare.

rbrotherAuthor Commented:
9660kel: Thank you, I'll give it a try tomorrow and let you know how I make out.
Keep us posted.
rbrotherAuthor Commented:
I forgot to ask, what is copyback?
That is a setting for writing the data back to a replaced disk after a failure. (allows you to reclaim a hot spare after it steps into the array in a failure scenario.)

I don't know if that is applicable, but it could cause trouble with this situation. (not sure how it would behave in this instance, but I like to understand the variables in play)
rbrotherAuthor Commented:
9660kel: I worked on the server again this morning following the directions from Adaptec to attempt to get the controller back in sync. Unfortunately it still will not rebuild. I even reinserted the original drives and got them in a failed state. When I inserted the new drive in the failed slot it did not rebuild, after setting the drive as a dedicated spare it did not rebuild as well. I also tried initializing  the old drive before removing it and doing the same to the new drive.

Here are a few screenshots of the Adaptec BIOS utility.  Still pulling what's left of my hair out.

Well, that brings up some ugly possibilities, is there any way to change the channel (at the cable level) the drives are talking on? Maybe different enclosures?

And how many re-boots has this thing had in the course of this process?
rbrotherAuthor Commented:
I don't want to think about the ugly possibilities yet :-)

It is a single channel controller so we cannot change that and it has gone through three reboots during this process. It is still functioning on the three remaining drives (makes me real nervous).
I think I should clarify, I was referring to the SCSI channel, not the card "port". There are 16 channels on a single port, one for the controller, and one terminating, leaving 14 open for devices.

I'm suggesting that you change out any drive trays if this is a hot swap enclosure, and if possible, change which connector on the scsi cable the drives are attached to.

The idea being that it would be good to rule out a hardware failure that isn't a drive.
rbrotherAuthor Commented:
I'll have to check Monday if I can change how the cable is connected. If my memory is correct the present SCSI cable has only one connector on each end and it is plugged directly in to the hard drive back plane.
Then maybe try swapping a tray out from another slot, again if that is possible.

Really, it might be good to try using the bios level management utility, it may not be as pretty, but it usually has less trouble with these kind of situations.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
rbrotherAuthor Commented:
I finally got a response from adaptec and you were correct in your earlier advice. They said I must do a full backup, remove the array, create a new array and then do a system recovery. We will be doing that this afternoon. Thanks again, I appreciate your assistance.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Windows Server 2003

From novice to tech pro — start learning today.