asked on
Three "Dead Drives" appear in a Promise RAID after removal and spraying off dust
Promise Pegasus 2 RAID with four 3TB drives
Connected to new Intel iMac - Mac OS 11.5.2 via a Thunderbolt 3-2 adapter and a Thunderbolt 2 cable
User noticed a large accumulation of dust on the surface of the RAID, and under. One at a time,They opened the door, removed a drive, used certified brands of cleaning spray - "Kensington Duster II", "Dust-Off Electronics Duster", sprayed the dust off of the drive, and then sprayed away the dust inside the drive compartment, including the drive connector socket. They worked on one drive at a time. They didn't remove a second drive until they were completely finished with the first drive they were working on. NOW - the Physical Drive list of the Promise utility shows that I have 3 DEAD drives. There are 3 red lights on the RAID next to 3 drives.
How can they restore the RAID to operation?
Possible contributing factor - They assumed that the drives were hot-swappable, and did not 'eject' the RAID from the iMac before starting.
Three dead drives seems unusual to me in a hot-swappable unit. Why would the drives not be recognized after being ejected, sprayed, and inserted again?
Thank you
And I do not think Promise allows you to force drives marked as dead back online.
I miss the days when drive arrays had locks on the doors to keep people from doing this...
In other RAID levels, that shouldn't happen, but before removing another disk you should check the unit's utility software & look at the status of the array. It could be rebuilding, so you'd need to wait until that has finished.
ASKER
If the RAID had been unplugged from power, would they have been able to take the drives out to spray off the dust?
Once a second drive dies the array is toast.
So, to the point: Pulling three drives one after the other without waiting for each drive to resynch (rebuild) which would be indicated by a fast blinking LED would indeed kill the array.
Got a backup?
ASKER
I think promise includes a Monitoring tool on the computer that might have alerted when the first drive was pulled.
If it was a raid 5/6 it will tolerate 1/2 drive failures respectively before the Array crashes.
i.e a RAID 5 of four drives, will tolerate the pull of the first drive, the array then will crash when the second one gets pulled even if the first drive was re-inserted. This is because when the first drive was pulled, the Array went into a degraded mode, when the drive was re-inserted, the state of the Array was still in Degraded mode, but might initiate an Array Rebuild by scanning the other three drivers.
When then the second drive is pulled, the array crashes as it can not sustain the Array's Data.
The Pull of the third, and Fourth ... disks are of no consequence on the Data that existed on drives 2,3,and 4 just before the second drive was pulled.
Accessing controller to see what it reflects in the forms of the state and errors.
AS far as the RAID 6 of 4 drive setup,
the Array will tolerate the 1st and the 2nd Drive's pull as it will enter a degraded mode, but will similarly crash as soon as the third drive is pulled.
The raid 5 and raid 6 take time to rebuild and is unlikely to complete a rebiuld in a time period that it takes an individual to pull a drive, air vent it, reinsert and pull the next.
a 500GB drive in a raid 5 setup will likely take several hours. often, the rebuild resource is limited to 30% to make the array somewhat useable even in a degraded state.
ASKER
I contacted Promise support on their website, using a trouble report procedure.
The tech gave me these steps to do:
Since 3 of the drives are DEAD , we can try to force online the drive but only 50% of chances are there to get the data back.
If you want to try this step, Please follow the below steps and execute the command in the CLI prompt one at a time .
- Please login to terminal
- Type promiseutil and press Return..
- Type these commands one by one and hit Return.
phydrv -a online -p2
phydrv -a online -p1
shutdown -a restart
Attach the subsystem report , after these steps to check for any other issues.
After the commands, 3 drives were online, and we had a degraded Raid. A spot-check indicated that the information on the Raid looked OK. The files were photos. I opened up several hundred that had been added at different times over the years, and did not receive any file system errors - the photos showed up full-size without damage.
I sent the subsystem report to Promise.
They replied, and advised me to rebuild the Raid, using the Promise utility. At this time, I'm waiting for the rebuild to complete.
I still have a backup, should it be needed.
Hot-swappable simply means you can remove and insert while the unit is on.
The thing you need to remember is that if a disk disappears from the array it is marked as failed. If you simply remove and replace one disk at a time without doing anything else, each disk will be inoperable in the array, so you've ended up seeing the result of that most likely. Usually you'll need to tell the controller what the state of the array is when a disk operation occurs, so if you pull a disk then replace it, often it needs initializing so the data stored on the rest of the disks can be written to the disk you inserted, to keep the array functional. Simply removing and re-inserting is NOT the way to do it.
Also, if any data was written to the disks that were still in the array while one was being cleaned, the array would need to copy some of the written data across the other disks (depending on RAID type and config). This will also lead to data corruption in most cases if a disk goes missing during a write operation.