[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 4533
  • Last Modified:

One or more logical drives contain a bad stripe

Hi,

I have IBM eServer226-[8648IAS] with Win2k3 X64 SP2, its RAID has bad stripes. It is hardware RAID controller.

The below is the Event Log

This message was generated by the ServeRAID Manager Agent.
Please do not reply to this message.

Event Description: One or more logical drives contain a bad stripe: controller 1 Event Type: Warning Event
Date: 12/24/2009
Time: 08:38:00 PM GMT+05:30

I had red solution about this problem in official IBM website, that recreation of all the partations is the only solution.

It is really too big task as I have to do too much gackground work this.

Any possible solution.... !
0
Murali
Asked:
Murali
  • 8
  • 4
1 Solution
 
MuraliAuthor Commented:
December 24, 2009 8:51:14 PM GMT+05:30

Configuration summary
---------------------------

Server name.....................<Confidential>
ServeRAID Manager agent.........7.00.15 (625)
ServeRAID Manager console.......7.00.15 (625)
Number of controllers...........1
Operating system................Windows

Configuration information for controller 1
-------------------------------------------------------
Type............................Controller
Model...........................IBM ServeRAID-6i
SCSI backend type...............Unknown
SCSI backend revision...........16
Controller FRU..................13N2195
Battery FRU.....................71P8628
Serial number...................2013B9D0
Part number.....................13N2190
Physical slot...................5
BIOS............................7.00.17
Firmware........................7.00.17
Device driver...................7.10.53
Controller status...............Optimal
Battery-backup cache............Installed
Battery temperature.............Normal
Battery charge level............100 %
Battery-backup cache size.......128 MB
Read-ahead cache mode...........Adaptive
Stripe-unit size................16 KB
Rebuild rate....................High
Hot-swap rebuild................Enabled
Copy back.......................Enabled
Data scrubbing..................Enabled
Auto-synchronization............Enabled
Unattended mode.................Disabled
BIOS-compatibility mapping......Extended
Number of arrays................1
Number of logical drives........1
Number of hot-spare drives......1
Number of ready drives..........0

Array A
--------------------
Array ID........................A
Array size......................210018 MB
Free space......................2 MB
Number of logical drives........1
Stripe order (channel/device)...2/1 2/2 2/3
Number of physical drives.......3

Logical drives in array A
--------------------------------
Logical drive...................1
Array letter....................A
RAID level......................5
Data space......................140012 MB
Parity space....................70006 MB
Stripe-unit size................16K
Date created....................06/27/2006
State...........................Okay
Write-cache mode................Enabled (write-back)
Protected by hot spare..........Yes


Physical drives in array A
--------------------------------
Type............................Hard disk drive
Vendor..........................IBM-ESXS
Model...........................ST373207
FRU part number.................39R7308
Serial number...................3KT46E2C
Firmware level..................B26C
Channel.........................2
SCSI ID.........................1
Size............................70006 MB
State...........................Online
Array letter....................A
PFA error.......................No

Type............................Hard disk drive
Vendor..........................IBM-ESXS
Model...........................GNS073C3
FRU part number.................39R7308
Serial number...................J20DW8VK
Firmware level..................JP86
Channel.........................2
SCSI ID.........................2
Size............................70006 MB
State...........................Online
Array letter....................A
PFA error.......................No

Type............................Hard disk drive
Vendor..........................IBM-ESXS
Model...........................MAW3073N
FRU part number.................39R7308
Serial number...................AR9027HT
Firmware level..................C206
Channel.........................2
SCSI ID.........................3
Size............................70006 MB
State...........................Online
Array letter....................A
PFA error.......................No


SCSI channel 1
-------------------
Number of devices...............0
Transfer speed..................Optimal
SCSI initiator ID...............7

SCSI channel 2
-------------------
Number of devices...............4
Transfer speed..................Optimal
SCSI initiator ID...............7

Type............................Hard disk drive
Vendor..........................IBM-ESXS
Model...........................PYH073C3
FRU part number.................39R7308
Serial number...................V3W7STNA
Firmware level..................RXQN
Channel.........................2
SCSI ID.........................0
Size............................70006 MB
State...........................Hot spare
PFA error.......................No

Type............................Hard disk drive
Vendor..........................IBM-ESXS
Model...........................ST373207
FRU part number.................39R7308
Serial number...................3KT46E2C
Firmware level..................B26C
Channel.........................2
SCSI ID.........................1
Size............................70006 MB
State...........................Online
Array letter....................A
PFA error.......................No

Type............................Hard disk drive
Vendor..........................IBM-ESXS
Model...........................GNS073C3
FRU part number.................39R7308
Serial number...................J20DW8VK
Firmware level..................JP86
Channel.........................2
SCSI ID.........................2
Size............................70006 MB
State...........................Online
Array letter....................A
PFA error.......................No

Type............................Hard disk drive
Vendor..........................IBM-ESXS
Model...........................MAW3073N
FRU part number.................39R7308
Serial number...................AR9027HT
Firmware level..................C206
Channel.........................2
SCSI ID.........................3
Size............................70006 MB
State...........................Online
Array letter....................A
PFA error.......................No

Type............................Enclosure management device
Vendor..........................IBM
Model...........................02R0980a
Serial number...................000
Firmware level..................1
Channel.........................2
SCSI ID.........................8


End of the configuration information for controller 1
-------------------------------------------------------


0
 
MuraliAuthor Commented:
Couple of month back the one disk was failed and we have replaced it and rebuild the new disk.
0
 
DavidCommented:
You have 2 unreadable blocks on different disks in the same stripe.
The way to fix it is to identify the bad stripe and write data to it (any data, it doesn't matter, because at least part of the stripe is unreadable).  

So if you were running LINUX or UNIX, you would enter:

dd if=/dev/xxx  of=/dev/null bs=yK [substitute xxx for the device name of the logical array, not any particular physical disk, y would be the stripe size which is 16, so if the logical disk was /dev/sda, dd if=/dev/sda of=/dev/null bs=16k

This would then give you an error message saying it can't read block such-and-such.  You would then use dd to write zeros to that block.

Above is a draconian technique that would result in corrupting 96KB of data.  

A more elegant technique would be to run a SCSI read verify on each physical disk, to determine what 2 (or possibly more) disks have bad blocks, and where, then remap the bad block.  If the blocks aren't exactly the same then you will have ZERO data loss.

You would have to plug disks into a non-RAID SCSI controller to do that as you need to have access to the physical disks through your O/S.

(Sorry, just re-read and noticed you are not on a UNIX, but information is valuable to others, so I decided not to erase it)
For WIndows, you'll need to use something like SANTOOLS  (http://www.santools.com) smartmon-ux, and run the -verify command on each disk, and then use the -rb command to repair the bad block(s).   Again, unless the block numbers are exactly the same, you will have zero data loss.

0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
MuraliAuthor Commented:
You I need fillup the disks with some data full for one time?
0
 
MuraliAuthor Commented:
You mean I need fillup the disks with some data full for one time?
0
 
DavidCommented:
no, this test is a READ test.  that is how you find the problem, unless you have additional issue that a disk's GLIST is full (techie for no more spare good blocks for remapping)  the santools software can tell you that also.  if you are out of sparesM then you need to image copy disk, then manually corrupt ECC on unreadable blocks or your RAID engine will make incorrect recovery decisions
0
 
MuraliAuthor Commented:
I am unable to understand what you have suggested and the solution.

Can you explain me what I've to do now to come from this?
0
 
DavidCommented:
the reason while full (but quasi manual) recovery is possible is that you firmware stops when there is just a single bad block in the entire 16KB chunk for each of the disks in the stripe.

so I would fix it by going with single block reads on the stripe work out where the parity is for that same physical block on each disk, then either repair it via parity recalc and write or marking it good.

somebody like me who writes RAID firmware and diags has the code to do this, and you can pretty much do it youself with some phone time and the santools software, but you just won't fibnd a commercial product for less than  thousands of dollars that could just automatically do it all. you need an experienced human to fix this one ... don't run any of those disk repair tools.  they will corrupt the array.

of course you could send it all to ontrack and give them $5000-$10K and they will use the same type of software I have to get it back.  you are in need of 1 on 1 time with experienced person to get it all back, unless you just don't mind losing some files and doing what I suggested earlier by hooking it up to a unix machine..  or just using the santools to fix just the bad individual blocks.
0
 
MuraliAuthor Commented:
What is the complete name of santools and version?
0
 
DavidCommented:
sorry, I know you are down but shopping for my wife's Xmas presents has to be my priority. will respond with less jargon in a few hrs.  but warning, don't take offense, but if the concept didn't make sense to you, then even if I post a tutorial then I am pessimistic you will be successful without somebody just doing it for you.  I can not possibly walk you through all possibilities.  In interim, no matter what, you will need a working system (windows probably easier) with a JBOD SCSI controller that can see all physical disks.   A large enough target disk may also be required to reconstruct the RAID to.  don't know wo assessing more.  but perhaps get a 500GB  disk, make single partition, load windows on it. you're going to learn how to find bad blocks.  download a binary editor freebie program also.  
0
 
MuraliAuthor Commented:
I am really sorry... It is not very urgent. However we have backed up everything... Have you happy Christmas... Have a joyfully day..
0
 
MuraliAuthor Commented:
are you back?
0

Featured Post

Windows Server 2016: All you need to know

Learn about Hyper-V features that increase functionality and usability of Microsoft Windows Server 2016. Also, throughout this eBook, you’ll find some basic PowerShell examples that will help you leverage the scripts in your environments!

  • 8
  • 4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now