DaveQuance
asked on
RAID Rebuild Issue, bad block table is full unable to log block
I've found plenty of info talking about people having this problem, but I haven't found anyone saying what to do if you do. While this is not a critical server (used mostly for labs), it does have enough on it that I would *definitely* like to avoid rebuilding it. There won't be any actual data loss if I have to though, just time.
I believe I have logical bad blocks from a hard drive failing in a RAID 1 of my RAID 10. Similar situations from all my searching also reference it as a "punctured" array (but that also seems to be a vendor-specific term). The array won't rebuild.
Slot 1 started reporting predictive failures. We purchased an brand new drive of the same model (ST3500320NS), shutdown the server, replaced the drive, and booted it back up (non-production server). It reached 90% on the rebuild and started throwing unrecoverable media errors (slot 0 [remaining drive of the RAID 1] and slot 1 [the new drive] increment at about the same media error rate when this happens). We returned the drive for a new one, same issue at 90%. All cables have been re-fixed, just in case. A chkdsk with /f or /R hangs at the exact same file count of stage 4/5 every time (waited 1.5 hours with no movement).
MRM Errors:
- Controller ID: 0 Unrecoverable medium error during rebuild: PD 0 location 0x34c67621
- Controller ID: 0 Bad block table is full; unable to log block: PD = 0:1, Block = 0x34c67621
Equipment:
Server: Cisco UCS C200 M1
Controller: Intel ICH10R (integrated)
Disk Drives: 2x ST3500320NS (Seagate SATA II 7200RPM 500GB 32MB Buffer))
RAID Configuration: RAID 10
Slot 0 and 1 are the slots for the mirror set in question (and the reference drives)
Operating System - Driver/MRM upgraded *after* issue started:
Operation System: Microsoft Windows Server 2008 R2 SP1 (patched as recent as 3 weeks)
Controller Driver: 15.0.2.2013.04.14 (previous 13.x)
Server Software: LSI MegaRaid Monitor 13.04.02.00 (previous 8.5.x)
Firmware latest HUU from Cisco after problem:
Current BIOS: C200.1.4.3k.0 (Build Date: 07/17/2013), (previous 1.4.3x, unsure but probably j)
CIMC: 1.4.3u (previous 1.4.3j)
I believe I have logical bad blocks from a hard drive failing in a RAID 1 of my RAID 10. Similar situations from all my searching also reference it as a "punctured" array (but that also seems to be a vendor-specific term). The array won't rebuild.
Slot 1 started reporting predictive failures. We purchased an brand new drive of the same model (ST3500320NS), shutdown the server, replaced the drive, and booted it back up (non-production server). It reached 90% on the rebuild and started throwing unrecoverable media errors (slot 0 [remaining drive of the RAID 1] and slot 1 [the new drive] increment at about the same media error rate when this happens). We returned the drive for a new one, same issue at 90%. All cables have been re-fixed, just in case. A chkdsk with /f or /R hangs at the exact same file count of stage 4/5 every time (waited 1.5 hours with no movement).
MRM Errors:
- Controller ID: 0 Unrecoverable medium error during rebuild: PD 0 location 0x34c67621
- Controller ID: 0 Bad block table is full; unable to log block: PD = 0:1, Block = 0x34c67621
Equipment:
Server: Cisco UCS C200 M1
Controller: Intel ICH10R (integrated)
Disk Drives: 2x ST3500320NS (Seagate SATA II 7200RPM 500GB 32MB Buffer))
RAID Configuration: RAID 10
Slot 0 and 1 are the slots for the mirror set in question (and the reference drives)
Operating System - Driver/MRM upgraded *after* issue started:
Operation System: Microsoft Windows Server 2008 R2 SP1 (patched as recent as 3 weeks)
Controller Driver: 15.0.2.2013.04.14 (previous 13.x)
Server Software: LSI MegaRaid Monitor 13.04.02.00 (previous 8.5.x)
Firmware latest HUU from Cisco after problem:
Current BIOS: C200.1.4.3k.0 (Build Date: 07/17/2013), (previous 1.4.3x, unsure but probably j)
CIMC: 1.4.3u (previous 1.4.3j)
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
I'll likely just let it sit as is until the other drive fails or causes a problem significant to force a rebuild, then replace it and rebuild. Since either event results in a rebuild, and a sudden failure isn't a big problem, no use in forcing the work now.