DriveReady SeekComplete Error

I have a server that gives me the following errors every day:

Jun 24 04:16:14 mail kernel: hde: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Jun 24 04:16:14 mail kernel: hde: dma_intr: error=0x40 { UncorrectableError }, LBAsect=140597862, sector=140517472
Jun 24 04:16:14 mail kernel: end_request: I/O error, dev 21:05 (hde), sector 140517472
Jun 24 04:16:19 mail kernel: hde: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Jun 24 04:16:19 mail kernel: hde: dma_intr: error=0x40 { UncorrectableError }, LBAsect=140597869, sector=140517480
Jun 24 04:16:19 mail kernel: end_request: I/O error, dev 21:05 (hde), sector 140517480
Jun 24 04:16:25 mail kernel: hde: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Jun 24 04:16:25 mail kernel: hde: dma_intr: error=0x40 { UncorrectableError }, LBAsect=140597862, sector=140517472
Jun 24 04:16:25 mail kernel: end_request: I/O error, dev 21:05 (hde), sector 140517472
Jun 24 04:16:30 mail kernel: hde: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Jun 24 04:16:30 mail kernel: hde: dma_intr: error=0x40 { UncorrectableError }, LBAsect=140597862, sector=140517472
Jun 24 04:16:30 mail kernel: end_request: I/O error, dev 21:05 (hde), sector 140517472
Jun 24 04:16:36 mail kernel: hde: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Jun 24 04:16:36 mail kernel: hde: dma_intr: error=0x40 { UncorrectableError }, LBAsect=140597862, sector=140517472
Jun 24 04:16:36 mail kernel: end_request: I/O error, dev 21:05 (hde), sector 140517472

This looks like a bad sector, but I can't have the server down long enough for a e2fsck -cc.
Can I use bad blocks with a list or something to mark those sectors bad and be done with it?
And would I use the actual sector number or the LBAsector #

LVL 11
Scott SilvaNetwork AdministratorAsked:
Who is Participating?
 
jlevieConnect With a Mentor Commented:
For what it's worth... It's been my experience that the appearance of one or more bad sectors on an IDE drive will shortly be followed by complete failure of the drive. I'd highly recommend a full backup of the file systems on that drive and replacement of the drive while it is still mostly working. And if this is a heavily used server you'll need to do your backup in single user mode (if it is a system disk) or with users and user applications locked out if it is a data drive to get a sane and usable backup.

It might be inconvient to have the system down while this occurs, but it'll be a lot more inconvient to have the drive fail and not have a backup that you can restore from.

If this is a mission-critical server, then you should really look a RAID configuration to protect yourself from a failed drive. At the least you could use a pair of drives and mirror with soft RAID, and preferrably use a RAID controller in RAID 5 mode with a hot spare (4 drives total).
0
 
bryanjonesCommented:
The reason for this - your drive is going bad - you can repair the drive for a while by fsck -y /dev/hde
0
 
Scott SilvaNetwork AdministratorAuthor Commented:
As you can see, it is only 2 sectors. I just want to lock out these 2 sectors, but can't have the server down for the several hours that a fsck would take.
0
Cloud Class® Course: Microsoft Office 2010

This course will introduce you to the interfaces and features of Microsoft Office 2010 Word, Excel, PowerPoint, Outlook, and Access. You will learn about the features that are shared between all products in the Office suite, as well as the new features that are product specific.

 
ahoffmannCommented:
there is no reliable method without downtime (except you're on RAID, or you can hot-plug a new disk)
0
 
bryanjonesCommented:
Even though it is two sectors - the drive is still bad - had the same issues as well.
0
 
Scott SilvaNetwork AdministratorAuthor Commented:
I will have a new drive on order by close of business today.

Thank you.
0
 
jlevieCommented:
Good move... If you don't have a means of backing up to another media or system you can connect the new drive to this box and do a disk-to-disk transfer. The preferred method would be to use dump/restore, but there could be a problem when dump encounters the bad blocks.

What I've done in the past when faced with sonmething like this is to do a read check on every file (e.g. cp /path-to/file /dev/null), deleteing any file that has bad blocks. As long as any bad blocks on the disk aren't a part of the directory stucture dump should not then have a problem.

As a last resort, if the bad blocks interfere with dump, you could try using e2fsck to map out those blocks. Since I'm rather paranoid when futzing with disk in that condition, I'd use other means (tar, cp, cpio) to replicate everything that I can onto the new disk before running e2fsck. That minimizes any potential loss if e2fsck can't fix the drive and it winds up being unreadable.
0
 
Scott SilvaNetwork AdministratorAuthor Commented:
I'm just wondering why this error shows up at the same time every day?
Must be triggered by some cron job. More digging to do.
The server is out of town, and I can't get to it til friday. Maybe I can make a link to another directory on another disk, and move the data. I can do that remotely.


"The more I work with computers, the more I realize where the term 'Boot' came from"
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.