RAID Rebuild Issue, bad block table is full unable to log block

I've found plenty of info talking about people having this problem, but I haven't found anyone saying what to do if you do. While this is not a critical server (used mostly for labs), it does have enough on it that I would *definitely* like to avoid rebuilding it. There won't be any actual data loss if I have to though, just time.

I believe I have logical bad blocks from a hard drive failing in a RAID 1 of my RAID 10. Similar situations from all my searching also reference it as a "punctured" array (but that also seems to be a vendor-specific term). The array won't rebuild.

Slot 1 started reporting predictive failures. We purchased an brand new drive of the same model (ST3500320NS), shutdown the server, replaced the drive, and booted it back up (non-production server). It reached 90% on the rebuild and started throwing unrecoverable media errors (slot 0 [remaining drive of the RAID 1] and slot 1 [the new drive] increment at about the same media error rate when this happens). We returned the drive for a new one, same issue at 90%. All cables have been re-fixed, just in case. A chkdsk with /f or /R hangs at the exact same file count of stage 4/5 every time (waited 1.5 hours with no movement).

MRM Errors:
- Controller ID: 0 Unrecoverable medium error during rebuild: PD 0 location 0x34c67621
- Controller ID: 0 Bad block table is full; unable to log block: PD = 0:1, Block = 0x34c67621

Equipment:

Server: Cisco UCS C200 M1
Controller: Intel ICH10R (integrated)
Disk Drives: 2x ST3500320NS  (Seagate SATA II 7200RPM 500GB 32MB Buffer))
RAID Configuration: RAID 10

Slot 0 and 1 are the slots for the mirror set in question (and the reference drives)


Operating System - Driver/MRM upgraded *after* issue started:
Operation System: Microsoft Windows Server 2008 R2 SP1 (patched as recent as 3 weeks)
Controller Driver: 15.0.2.2013.04.14 (previous 13.x)
Server Software: LSI MegaRaid Monitor 13.04.02.00 (previous 8.5.x)

Firmware latest HUU from Cisco after problem:
Current BIOS: C200.1.4.3k.0 (Build Date: 07/17/2013), (previous 1.4.3x, unsure but probably j)
CIMC: 1.4.3u (previous 1.4.3j)
DaveQuanceAsked:
Who is Participating?

Improve company productivity with a Business Account.Sign Up

x
 
DavidConnect With a Mentor PresidentCommented:
If you want all of your data, you need to call in a pro, no way can I walk you through a recovery.

But to explain the situation, the surviving disk has unreadable blocks as well. You do have data loss.  Nothing can be done to get what the controller can't read from the disk.

Replace the "surviving" disk as well. data can't be read, so it is also screwed up.  Then build the RAID1 and restore from backup.

Root cause can be anything from bad luck to crappy power, but in the grand scheme of things, with those cheap disks and $2.00 embedded RAID controller and disks that cost about $35 in bulk, then this is what you need to expect.
0
 
DaveQuanceAuthor Commented:
Understandable, it still boots and all the lab VMs work fine (it was a server we got at a MASSIVE bargain and served nicely for extra servers in larger lab environments). It's also made a cheap backup server too.

I'll likely just let it sit as is until the other drive fails or causes a problem significant to force a rebuild, then replace it and rebuild. Since either event results in a rebuild, and a sudden failure isn't a big problem, no use in forcing the work now.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.