3ware raid - degraded raid 5

I have a raid 5 with degraded state,The current state of the raid is:

server:~# tw_cli info c0

Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    RAID-5    DEGRADED       -       -       64K     465.641   OFF    OFF    

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     233.76 GB   490234752     WD-WCANY3473512    
p1     ECC-ERROR        u0     233.76 GB   490234752     WD-WCANY3522205    
p2     DEGRADED         u0     233.76 GB   490234752     WD-WCANY3473475    
p3     NOT-PRESENT      -      -           -             -

After using the commands:

tw_cli maint remove c0 p2
tw_cli maint rescan c0
tw_cli maint rebuild c0 u0 p2

The rebuilding starts but never finish, and exit with this error:

server:~# tw_cli info c0 u0

Unit     UnitType  Status         %RCmpl  %V/I/M  Port  Stripe  Size(GB)
------------------------------------------------------------------------
u0       RAID-5    DEGRADED*      -       -       -     64K     465.641  
u0-0     DISK      DEGRADED       -       -       p2    -       232.82    
u0-1     DISK      OK             -       -       p0    -       232.82    
u0-2     DISK      WARNING        -       -       p1    -       232.82    

Any idea how can i fix this problem and restore my RAID functionality/???
Thank you
LVL 10
amprantiAsked:
Who is Participating?
 
CallandorCommented:
Isn't the normal procedure to replace the faulty drive and then rebuild the array?  Trying to rebuild an array that has a degraded drive won't fix it if the drive has stopped working.
0
 
amprantiAuthor Commented:
Degraded mean that the drive is faulty??
The drive seems to work, the process start but later fails
0
 
CallandorCommented:
If it fails at any point, that indicates to me that the drive may have a problem with it, though it may be intermittent.  Trying a new drive will save you a lot of time.
0
Network Scalability - Handle Complex Environments

Monitor your entire network from a single platform. Free 30 Day Trial Now!

 
amprantiAuthor Commented:
In var/log/messages when rebuild stops i get these errors:

May  3 17:00:56 skilla kernel: [94981.224368] 3w-9xxx: scsi0: AEN: ERROR (0x04:0x0026): Drive ECC error reported:port=1, unit=0.May  3 17:00:56 skilla kernel: [94981.224637] 3w-9xxx: scsi0: AEN: ERROR (0x04:0x002D): Source drive error occurred:unit=0, port=1.May  3 17:00:56 skilla kernel: [94981.224899] 3w-9xxx: scsi0: AEN: ERROR (0x04:0x0004): Rebuild failed:unit=0.May  3 17:00:56 skilla kernel: [94981.225160] 3w-9xxx: scsi0: AEN: ERROR (0x04:0x0002): Degraded unit:unit=0, port=2.


According to this site:
https://twiki.cern.ch/twiki/bin/view/FIOgroup/DiskPrbTw
A drive has reported an ECC-error and the disk should be replaced. This will generally lead to a RAID_TW alarm and the vendor call will follow from the standard procedure.

So i have to replace the disk
0
 
amprantiAuthor Commented:
skilla:~# tw_cli info c0 u0    

Unit     UnitType  Status         %RCmpl  %V/I/M  Port  Stripe  Size(GB)
------------------------------------------------------------------------
u0       RAID-5    DEGRADED*      -       -       -     64K     465.641  
u0-0     DISK      DEGRADED       -       -       p2    -       232.82    
u0-1     DISK      OK             -       -       p0    -       232.82    
u0-2     DISK      WARNING        -       -       p1    -       232.82    

skilla:~# tw_cli info c0

Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    RAID-5    DEGRADED       -       -       64K     465.641   OFF    OFF    

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     233.76 GB   490234752     WD-WCANY3473512    
p1     ECC-ERROR        u0     233.76 GB   490234752     WD-WCANY3522205    
p2     DEGRADED         u0     233.76 GB   490234752     WD-WCANY3473475    
p3     NOT-PRESENT      -      -           -             -


Which disk do i have to replace ???

p1 or p2 ???
0
 
CallandorCommented:
The one with the ECC error - p1.
0
 
amprantiAuthor Commented:
After removing p2, the raid is disk changed state to "OK"

tw_cli info c0 u0

Unit     UnitType  Status         %RCmpl  %V/I/M  Port  Stripe  Size(GB)
------------------------------------------------------------------------
u0       RAID-5    DEGRADED       -       -       -     64K     465.641  
u0-0     DISK      DEGRADED       -       -       -     -       232.82    
u0-1     DISK      OK             -       -       p0    -       232.82    
u0-2     DISK      OK             -       -       p1    -       232.82    

I hope the data to be regenerated after replacing the disk...
0
 
amprantiAuthor Commented:
Service checked disk "p2" and found that it was ok.
I put back disk "p2"  and now all disks are OK

# tw_cli info c0

Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    RAID-5    DEGRADED       -       -       64K     465.641   OFF    OFF    

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     233.76 GB   490234752     WD-WCANY3473512    
p1     OK               u0     233.76 GB   490234752     WD-WCANY3522205    
p2     OK               -      233.76 GB   490234752     WD-WCANY3473475    
p3     NOT-PRESENT      -      -           -             -


However, if i remove disk p1 (the one with ECC-ERROR) "/" parttition is remount as read-only
/dev/sda5 on / type ext3 (rw,errors=remount-ro)
Of course all services misfunction....

Can i check somehow that if i replace "p1" disk I will be able to regenarate the data for p1?
0
 
CallandorCommented:
If you are sure p0 and p2 are in good working order, replacing p1 should work.  However, you've had a series of problems that involved more than one drive, so I'm not sure everything will turn out ok.  You DO have backups, don't you?
0
 
amprantiAuthor Commented:
I still can read the disk and have a recent backup of everything.

If i get a backup using clonezilla (system is offline and get an image of hard disk), if i replace both bad disks and install the recover the system from the image, should i be ok???


Thanks
0
 
CallandorCommented:
Yes - if that doesn't work, that might mean the controller is no good.  Either way, a backup is how you recover from these situations.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.