Link to home
Start Free TrialLog in
Avatar of Sheldon Livingston
Sheldon LivingstonFlag for United States of America

asked on

RAID 5 degraded

I have a RAID 5 system (Intel(R) C600+/C220+ series chipset SATA RAID Controller).

I received an error that the RAID is degraded and a drive has failed.

I can "Mark as normal" the failed drive.

What does "Mark as normal" mean or do?
Avatar of David Favor
David Favor
Flag of United States of America image

I suppose this is okay + drives are cheap. I would have just replaced the failed drive.

If the same drive fails again... replace it...
replace the drive with an equivalent enterprise drive and let the RAID rebuild.
Avatar of noci
noci

There is only one problem with raid sets....  if the first drive fail... repair it.  
if in a raidset another drive fails you may run into an irretrievably lost raid set.

(be sure to read the whole disk after marking the disk "normal"   otherwise the next failure on another drive willl kill all your data.

When disks are >1TB RAID 6 might be a wiser choice.  (then two drives may fail.)
For me: failed drive is failed drive , and it needs replacement.  (often times all disks in a raidset have been in the same circumstances
so if one fails there is a fair chance one of the others (mostly bought at the same time, used during the same lifespan, same type.. possibly same batch) might fail in the same way fairly soon.
noci said it better than I did.

Your big consideration is if one disk is flakey... then another disk fails... then your flakey disk has a hard/unrecoverable failure... you just lost all your data.

With RAID5, for me, anytime there's any minor drive glitch, I take David Johnson's advice... replace the drive.
ASKER CERTIFIED SOLUTION
Avatar of Member_2_231077
Member_2_231077

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Make sure you have current verified Backups and Replace drive ASAP
I do not think any of the Storage Professionals on EE would recomend carrying on with that Drive
In this old (2009), but as it looks most recent guide, "mark as online" is mentioned only in one case:
I guess that they changed the phrase "mark as online" to "mark as good" in this years.

"If multiple drives have failed or have been marked offline, there is a
significant probability of data loss and the condition may not be recoverable.
You can call Intel support for assistance; or you can attempt to mark all but
the first failed drive as online, replace the remaining failed drive, and
attempt a rebuild."

https://www.intel.com/content/dam/support/us/en/documents/motherboards/server/sb/intel_raid_basic_troubleshooting_guide_v2_0.pdf

It looks like it is used in cases of multiple drives failed as desperate attempt to recover a failed RAID array.

In your case it will bring the disk back online as good and try to rebuild the array. If successfully you will have "working RAID array" with a disk I wouldn't trust. If you will be lucky, this disk will fail again as first. If some other disk fails, I would be very worried if I would be able to rebuild the array and already looking for latest backup.

I would not test my luck with using questionable disks. In HP servers is already "predicted failure" state enough to get disk replaced in warranty.

You have the cost of the server downtime and  the cost of your time and nerves, while repairing it, on the one side and the cost of disk on the other side. Actually you need to add the cost of the disk on the both sides.

Only in case you have repeatedly failed different disks in short time (even new ones), the problem could be somewhere else and not in failed disks.
Avatar of Sheldon Livingston

ASKER

I simply replaced the drive.  "Mark as normal", when checked by the user, is basically telling the control to attempt to put this drive back into action.

My concern, which prompted this question in the first place, was does "Mark as normal" actually mean.  I was concerned that it meant "ignore the issue and just pretend it didn't happen".

This is not the case.  Again... marking the drive as normal means to attempt to rebuild the array with the failed drive.

FYI... I tried that and it failed again at 75%.  I replaced the drive.
Ah...

So you really did replace the disk.

Some RAID hardware/software is smart enough to automate the process of integrating a new disk, so in this case the "Mark as normal" action occurs in background.

Other... more conservative RAID systems say... "I'll wait for you to manually mark a drive as normal"... So a manual action is required before the disk is ever considered again.

"Mark as normal" rarely means "ignore the issue and just pretend it didn't happen".

"Mark as normal" usually triggers an attempt for the RAID hardware/software to consider the disk as part of the array + integrate the disk into the array.

Note I used the words rarely + usually, because each RAID hardware/software combo is different.

Likely your specific RAID hardware/software docs explain precisely  what "Mark as normal" means.

If not, open a support ticket with your RAID hardware/software vendor for clarification.
like othes already tols you:

Mark as normal will mean the controler forgets about the failed state on this disk,there for it thinks a new drive was inserted.
(or maybe only clear the errorstate if the mark as normal would be done before too many differences started to buildup in the raid array it would just flush buffers,
and continue where it left).
The rebuild may very well only be started because there were changes it couldn't handle without rebuild.
How odd, you asked 'What does "Mark as normal" mean or do?' and yet didn't accept any answers that explained what it did.
I agree andyalder... I should have marked you answer as a partial answer.  I think the complete answer is that it also, after ignoring any previous errors, attempts to rebuild the array.