Link to home
Start Free TrialLog in
Avatar of w_marquardt
w_marquardtFlag for United States of America

asked on

RAID-1 drive failure / degraded.

I have an older Dell server that I am now responsible for.

The box is a Dell PowerEdge 2950 with a SAS 6/iR

It has two drive configured for RAID-1. One of the virtual drives is degraded. My only options on the tasks for that drive are blink and unblink. I was looking for a way to rebuild this drive. If I check on the Enclosure/physical disks in the openmanage server administrator, the problem drive's state is listed as Foreign with Failure predicted. The client has replaced this drive twice and we get the same notification.

On the alert log for that drive, there is the following description:
SAS port report: SAS wide port 0 lost link on PHY 0.: Controller 0 (SAS 6/iR Intergrated)
that's followed by
SAS port report: SAS wide port 0 restored link on PHY 0.: Controller 0 (SAS 6/iR Intergrated)
After my login to the openmanage server we get
(with a green checkmark)
A foreign configuration has been detected.: Control 0
Virtual disk degraded: Virtual Disk 0
Predictive failure reported: Physical Disk 0:0:0 Controller 0, Connector 0

predictive failure sounds like a failing / failed drive but two new drive in a row? Seems a little strange.

I'm not very versed in Raid controllers. Most times when I have an issue, I've replaced the failing drive, it rebuilt itself and life went on.

Any assistance on how to resolve this for the client would be greatly appreciated.

At this point I only have remote access to the server via RDP. I can get to it onsite if needed but that probably wouldn't happen until Monday.

Thanks,

Bill
Avatar of noxcho
noxcho
Flag of Germany image

Do the new drives have the same firmware like the drive you've replaced?
Are the replacement drives the same model as the original / as the other drive in there?  Some older systems require this to be the case.
Avatar of w_marquardt

ASKER

I'll need to check with the client. He did the physical replacement of the drives. I'll let you know. Thanks!
On those old guys Dell always had us pull the "dead" drive, wait a minute, then plug it back in again.

It would magically reappear and after a resync/rebuild be happy until the next time it happened.

The backplane firmware was a bit flaky as I recall.
Avatar of Member_2_231077
Member_2_231077

The log entries are exactly what you would see if a drive powered off and on again or went offline temporarily for any other reason, e,g, pulling it and pushing it back. It will always appear as foreign in that situation as it has an old timestamp in the metadata so is not in sync with the others.

I would try a different drive bay for the replacement, since it is in a different slot you will have to set it as a global hotspare before it rebuilds onto it.

You can use one of the new/faulty drives by clearing the foreign config. It may still show as predictive failure though. That's done at the controller level in the menu rather than the drive and scares the willies out of me. Given an option I put them in a spare machine to clear it.

There's also the possibility of it being a bad block on the remaining disk but that should be in the log,  If you save the log from thae log menu we could peruse it for you.
I'll check on that too.

I just checked with the client and the drives are the exact same make and model. I've included a screenshot of the physical disks. There are two virtual disks on the system. The first is using physical disk 0 & 1, and the second is using physical disk 2 & 3.

Thanks for the ideas on this.

Bill
screenshot_physical_disks.JPG
Can you tell me what log file you'd like to see? I can upload that but I want to be sure I'm uploading the right one.

Bill
The client's there and can put both rejected disks in slots 5 and 6? That'll help you a lot, you can build a test array with them and see if they are really OK.

I really don't understand how a fault on disk 1 can cause a predictive failure on a replacement disk 2 but here it is from Dell...
https://www.dell.com/support/article/us/en/04/sln111497/double-faults-and-punctures-in-raid-arrays?lang=en
Unfortunately, the client quite far from the location. He's running a co-located server in a Chicago suburb and is based (and lives) in Chicago. I'll check with him on this possibility.

I've read that article from Dell before when dealing with drive punctures. I've had them happen way to frequently. This thankfully isn't that issue.

Thanks,

Bill
I have finally been able to get over there (co-located facility) and get the backups I wanted made (data and image.) I'll restart the server and look at the array info at the start up so I can see what's going on, clear the problems and hopefully, get it back up and running.
It looks like it's a back plane issue. No matter what drive I put in the slot it shows up as not installed. Client needs to decide what they want to do with this.
ASKER CERTIFIED SOLUTION
Avatar of w_marquardt
w_marquardt
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Did you try a different slot?