Avatar of w_marquardt
w_marquardt
Flag for United States of America asked on

RAID-1 drive failure / degraded.

I have an older Dell server that I am now responsible for.

The box is a Dell PowerEdge 2950 with a SAS 6/iR

It has two drive configured for RAID-1. One of the virtual drives is degraded. My only options on the tasks for that drive are blink and unblink. I was looking for a way to rebuild this drive. If I check on the Enclosure/physical disks in the openmanage server administrator, the problem drive's state is listed as Foreign with Failure predicted. The client has replaced this drive twice and we get the same notification.

On the alert log for that drive, there is the following description:
SAS port report: SAS wide port 0 lost link on PHY 0.: Controller 0 (SAS 6/iR Intergrated)
that's followed by
SAS port report: SAS wide port 0 restored link on PHY 0.: Controller 0 (SAS 6/iR Intergrated)
After my login to the openmanage server we get
(with a green checkmark)
A foreign configuration has been detected.: Control 0
Virtual disk degraded: Virtual Disk 0
Predictive failure reported: Physical Disk 0:0:0 Controller 0, Connector 0

predictive failure sounds like a failing / failed drive but two new drive in a row? Seems a little strange.

I'm not very versed in Raid controllers. Most times when I have an issue, I've replaced the failing drive, it rebuilt itself and life went on.

Any assistance on how to resolve this for the client would be greatly appreciated.

At this point I only have remote access to the server via RDP. I can get to it onsite if needed but that probably wouldn't happen until Monday.

Thanks,

Bill
Remote AccessDellRAIDMondayStorage

Avatar of undefined
Last Comment
andyalder

8/22/2022 - Mon
noxcho

Do the new drives have the same firmware like the drive you've replaced?
hypercube

Are the replacement drives the same model as the original / as the other drive in there?  Some older systems require this to be the case.
w_marquardt

ASKER
I'll need to check with the client. He did the physical replacement of the drives. I'll let you know. Thanks!
Your help has saved me hundreds of hours of internet surfing.
fblack61
Philip Elder

On those old guys Dell always had us pull the "dead" drive, wait a minute, then plug it back in again.

It would magically reappear and after a resync/rebuild be happy until the next time it happened.

The backplane firmware was a bit flaky as I recall.
andyalder

The log entries are exactly what you would see if a drive powered off and on again or went offline temporarily for any other reason, e,g, pulling it and pushing it back. It will always appear as foreign in that situation as it has an old timestamp in the metadata so is not in sync with the others.

I would try a different drive bay for the replacement, since it is in a different slot you will have to set it as a global hotspare before it rebuilds onto it.

You can use one of the new/faulty drives by clearing the foreign config. It may still show as predictive failure though. That's done at the controller level in the menu rather than the drive and scares the willies out of me. Given an option I put them in a spare machine to clear it.

There's also the possibility of it being a bad block on the remaining disk but that should be in the log,  If you save the log from thae log menu we could peruse it for you.
w_marquardt

ASKER
I'll check on that too.

I just checked with the client and the drives are the exact same make and model. I've included a screenshot of the physical disks. There are two virtual disks on the system. The first is using physical disk 0 & 1, and the second is using physical disk 2 & 3.

Thanks for the ideas on this.

Bill
screenshot_physical_disks.JPG
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
w_marquardt

ASKER
Can you tell me what log file you'd like to see? I can upload that but I want to be sure I'm uploading the right one.

Bill
andyalder

The client's there and can put both rejected disks in slots 5 and 6? That'll help you a lot, you can build a test array with them and see if they are really OK.

I really don't understand how a fault on disk 1 can cause a predictive failure on a replacement disk 2 but here it is from Dell...
https://www.dell.com/support/article/us/en/04/sln111497/double-faults-and-punctures-in-raid-arrays?lang=en
w_marquardt

ASKER
Unfortunately, the client quite far from the location. He's running a co-located server in a Chicago suburb and is based (and lives) in Chicago. I'll check with him on this possibility.

I've read that article from Dell before when dealing with drive punctures. I've had them happen way to frequently. This thankfully isn't that issue.

Thanks,

Bill
All of life is about relationships, and EE has made a viirtual community a real community. It lifts everyone's boat
William Peck
w_marquardt

ASKER
I have finally been able to get over there (co-located facility) and get the backups I wanted made (data and image.) I'll restart the server and look at the array info at the start up so I can see what's going on, clear the problems and hopefully, get it back up and running.
w_marquardt

ASKER
It looks like it's a back plane issue. No matter what drive I put in the slot it shows up as not installed. Client needs to decide what they want to do with this.
ASKER CERTIFIED SOLUTION
w_marquardt

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
andyalder

Did you try a different slot?
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.