• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 3919
  • Last Modified:

3ware raid - degraded raid 5

I have a raid 5 with degraded state,The current state of the raid is:

server:~# tw_cli info c0

Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    RAID-5    DEGRADED       -       -       64K     465.641   OFF    OFF    

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     233.76 GB   490234752     WD-WCANY3473512    
p1     ECC-ERROR        u0     233.76 GB   490234752     WD-WCANY3522205    
p2     DEGRADED         u0     233.76 GB   490234752     WD-WCANY3473475    
p3     NOT-PRESENT      -      -           -             -

After using the commands:

tw_cli maint remove c0 p2
tw_cli maint rescan c0
tw_cli maint rebuild c0 u0 p2

The rebuilding starts but never finish, and exit with this error:

server:~# tw_cli info c0 u0

Unit     UnitType  Status         %RCmpl  %V/I/M  Port  Stripe  Size(GB)
------------------------------------------------------------------------
u0       RAID-5    DEGRADED*      -       -       -     64K     465.641  
u0-0     DISK      DEGRADED       -       -       p2    -       232.82    
u0-1     DISK      OK             -       -       p0    -       232.82    
u0-2     DISK      WARNING        -       -       p1    -       232.82    

Any idea how can i fix this problem and restore my RAID functionality/???
Thank you
0
ampranti
Asked:
ampranti
  • 6
  • 5
1 Solution
 
CallandorCommented:
Isn't the normal procedure to replace the faulty drive and then rebuild the array?  Trying to rebuild an array that has a degraded drive won't fix it if the drive has stopped working.
0
 
amprantiAuthor Commented:
Degraded mean that the drive is faulty??
The drive seems to work, the process start but later fails
0
 
CallandorCommented:
If it fails at any point, that indicates to me that the drive may have a problem with it, though it may be intermittent.  Trying a new drive will save you a lot of time.
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
amprantiAuthor Commented:
In var/log/messages when rebuild stops i get these errors:

May  3 17:00:56 skilla kernel: [94981.224368] 3w-9xxx: scsi0: AEN: ERROR (0x04:0x0026): Drive ECC error reported:port=1, unit=0.May  3 17:00:56 skilla kernel: [94981.224637] 3w-9xxx: scsi0: AEN: ERROR (0x04:0x002D): Source drive error occurred:unit=0, port=1.May  3 17:00:56 skilla kernel: [94981.224899] 3w-9xxx: scsi0: AEN: ERROR (0x04:0x0004): Rebuild failed:unit=0.May  3 17:00:56 skilla kernel: [94981.225160] 3w-9xxx: scsi0: AEN: ERROR (0x04:0x0002): Degraded unit:unit=0, port=2.


According to this site:
https://twiki.cern.ch/twiki/bin/view/FIOgroup/DiskPrbTw
A drive has reported an ECC-error and the disk should be replaced. This will generally lead to a RAID_TW alarm and the vendor call will follow from the standard procedure.

So i have to replace the disk
0
 
amprantiAuthor Commented:
skilla:~# tw_cli info c0 u0    

Unit     UnitType  Status         %RCmpl  %V/I/M  Port  Stripe  Size(GB)
------------------------------------------------------------------------
u0       RAID-5    DEGRADED*      -       -       -     64K     465.641  
u0-0     DISK      DEGRADED       -       -       p2    -       232.82    
u0-1     DISK      OK             -       -       p0    -       232.82    
u0-2     DISK      WARNING        -       -       p1    -       232.82    

skilla:~# tw_cli info c0

Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    RAID-5    DEGRADED       -       -       64K     465.641   OFF    OFF    

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     233.76 GB   490234752     WD-WCANY3473512    
p1     ECC-ERROR        u0     233.76 GB   490234752     WD-WCANY3522205    
p2     DEGRADED         u0     233.76 GB   490234752     WD-WCANY3473475    
p3     NOT-PRESENT      -      -           -             -


Which disk do i have to replace ???

p1 or p2 ???
0
 
CallandorCommented:
The one with the ECC error - p1.
0
 
amprantiAuthor Commented:
After removing p2, the raid is disk changed state to "OK"

tw_cli info c0 u0

Unit     UnitType  Status         %RCmpl  %V/I/M  Port  Stripe  Size(GB)
------------------------------------------------------------------------
u0       RAID-5    DEGRADED       -       -       -     64K     465.641  
u0-0     DISK      DEGRADED       -       -       -     -       232.82    
u0-1     DISK      OK             -       -       p0    -       232.82    
u0-2     DISK      OK             -       -       p1    -       232.82    

I hope the data to be regenerated after replacing the disk...
0
 
amprantiAuthor Commented:
Service checked disk "p2" and found that it was ok.
I put back disk "p2"  and now all disks are OK

# tw_cli info c0

Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    RAID-5    DEGRADED       -       -       64K     465.641   OFF    OFF    

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     233.76 GB   490234752     WD-WCANY3473512    
p1     OK               u0     233.76 GB   490234752     WD-WCANY3522205    
p2     OK               -      233.76 GB   490234752     WD-WCANY3473475    
p3     NOT-PRESENT      -      -           -             -


However, if i remove disk p1 (the one with ECC-ERROR) "/" parttition is remount as read-only
/dev/sda5 on / type ext3 (rw,errors=remount-ro)
Of course all services misfunction....

Can i check somehow that if i replace "p1" disk I will be able to regenarate the data for p1?
0
 
CallandorCommented:
If you are sure p0 and p2 are in good working order, replacing p1 should work.  However, you've had a series of problems that involved more than one drive, so I'm not sure everything will turn out ok.  You DO have backups, don't you?
0
 
amprantiAuthor Commented:
I still can read the disk and have a recent backup of everything.

If i get a backup using clonezilla (system is offline and get an image of hard disk), if i replace both bad disks and install the recover the system from the image, should i be ok???


Thanks
0
 
CallandorCommented:
Yes - if that doesn't work, that might mean the controller is no good.  Either way, a backup is how you recover from these situations.
0

Featured Post

Prepare for your VMware VCP6-DCV exam.

Josh Coen and Jason Langer have prepared the latest edition of VCP study guide. Both authors have been working in the IT field for more than a decade, and both hold VMware certifications. This 163-page guide covers all 10 of the exam blueprint sections.

  • 6
  • 5
Tackle projects and never again get stuck behind a technical roadblock.
Join Now