Link to home
Start Free TrialLog in
Avatar of computerlarry
computerlarryFlag for United States of America

asked on

Three "Dead Drives" appear in a Promise RAID after removal and spraying off dust

Promise Pegasus 2 RAID with four 3TB drives

Connected to new Intel iMac - Mac OS 11.5.2   via a Thunderbolt 3-2 adapter and a Thunderbolt 2 cable



User noticed a large accumulation of dust on the surface of the RAID, and under.  One at a time,They opened the door, removed a drive, used certified brands of cleaning spray - "Kensington Duster II", "Dust-Off Electronics Duster", sprayed the dust off of the drive, and then sprayed away the dust inside the drive compartment, including the drive connector socket. They worked on one drive at a time. They didn't remove a second drive until they were completely finished with the first drive they were working on.  NOW - the Physical Drive list of the Promise utility shows that I have 3  DEAD  drives.  There are 3 red lights on the RAID next to 3 drives.

How can they restore the RAID to operation?


Possible contributing factor - They assumed that the drives were hot-swappable, and did not 'eject' the RAID from the iMac before starting.  

Three dead drives seems unusual to me in a hot-swappable unit.  Why would the drives not be recognized after being ejected, sprayed, and inserted again?



Thank you

Avatar of Craig Beck
Craig Beck
Flag of United Kingdom of Great Britain and Northern Ireland image

Unless they have a backup they've probably broken the array and killed the data.

Hot-swappable simply means you can remove and insert while the unit is on.

The thing you need to remember is that if a disk disappears from the array it is marked as failed. If you simply remove and replace one disk at a time without doing anything else, each disk will be inoperable in the array, so you've ended up seeing the result of that most likely. Usually you'll need to tell the controller what the state of the array is when a disk operation occurs, so if you pull a disk then replace it, often it needs initializing so the data stored on the rest of the disks can be written to the disk you inserted, to keep the array functional. Simply removing and re-inserting is NOT the way to do it.

Also, if any data was written to the disks that were still in the array while one was being cleaned, the array would need to copy some of the written data across the other disks (depending on RAID type and config). This will also lead to data corruption in most cases if a disk goes missing during a write operation.
Improperly ejecting the drives is what got them marked as dirty.
And I do not think Promise allows you to force drives marked as dead back online.
I miss the days when drive arrays had locks on the doors to keep people from doing this...



What RAID level was being used? RAID 0 for example has no redundancy, and removing a disks, while the unit is running will cause such issues.
In other RAID levels, that shouldn't happen, but before removing another disk you should check the unit's utility software & look at the status of the array. It could be rebuilding, so you'd need to wait until that has finished.
Avatar of computerlarry

ASKER

The user applied human logic to a computer situation.  They didn't understand RAID 5, and didn't realize that drives can't be taken out while the RAID is still connected to power.

If the RAID had been unplugged from power, would they have been able to take the drives out to spray off the dust?

RAID 5 can survive the loss of one drive while powered on.

Once a second drive dies the array is toast.

So, to the point: Pulling three drives one after the other without waiting for each drive to resynch (rebuild) which would be indicated by a fast blinking LED would indeed kill the array.

Got a backup?
And yes... If the array was powered down, they could have done that maintenance...
ASKER CERTIFIED SOLUTION
Avatar of arnold
arnold
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I contacted Promise, and the tech wrote on their website that there might be a chance.  I have been trying to get a remote service session from them, but although they said that they would contact me, Promise didn't give a time.  I'm waiting today, but so far no response.

Depending on where you live, it is probably still Sunday, and in many parts of the world that is a day off. So I wouldn't expect too much help on such days. On Monday you could have a better chance. So don't be too impatient.
Much depends on what the Definition was.
I think promise includes a Monitoring tool on the computer that might have alerted when the first drive was pulled.

If it was a raid 5/6 it will tolerate 1/2 drive failures respectively before the Array crashes.

i.e a RAID 5 of four drives, will tolerate the pull of the first drive, the array then will crash when the second one gets pulled even if the first drive was re-inserted. This is because when the first drive was pulled, the Array went into a degraded mode, when the drive was re-inserted, the state of the Array was still in Degraded mode, but might initiate an Array Rebuild by scanning the other three drivers.
When then the second drive is pulled, the array crashes as it can not sustain the Array's Data.
The Pull of the third, and Fourth ... disks are of no consequence on the Data that existed on drives 2,3,and 4 just before the second drive was pulled.
Accessing controller to see what it reflects in the forms of the state and errors.

AS far as the RAID 6 of 4 drive setup,
the Array will tolerate the 1st and the 2nd Drive's pull as it will enter a degraded mode, but will similarly crash as soon as the third drive is pulled.
The raid 5 and raid 6 take time to rebuild and is unlikely to complete a rebiuld in a time period that it takes an individual to pull a drive, air vent it, reinsert and pull the next.

a 500GB drive in a raid 5 setup will likely take several hours. often, the rebuild resource is limited to 30% to make the array somewhat useable even in a degraded state.
The user seems to have gotten VERY LUCKY with this one!   (Maybe because all thee user did was to pull drives)

I contacted Promise support on their website, using a trouble report procedure.
The tech gave me these steps to do:

Since 3 of the drives are DEAD , we can try to force online the drive but only 50% of chances are there to get the data back.
 
If you want to try this step, Please follow the below steps and execute the command in the CLI prompt one at a time .
 
- Please login to terminal
- Type promiseutil and press Return..
- Type these commands one by one and hit Return.
 
phydrv -a online -p2
phydrv -a online -p1
 
shutdown -a restart
 
Attach the subsystem report , after these steps to check for any other issues.

After the commands, 3 drives were online, and we had a degraded Raid. A spot-check indicated that the information on the Raid looked OK.  The files were photos.  I opened up several hundred that had been added at different times over the years, and did not receive any file system errors - the photos showed up full-size without damage.

I sent the subsystem report to Promise.
They replied, and advised me to rebuild the Raid, using the Promise utility.  At this time, I'm waiting for the rebuild to complete.

I still have a backup, should it be needed.