SQL Server torn pages

Posted on 2005-04-12
Medium Priority
Last Modified: 2008-01-09

I’m receiving an 823 error message indicating I’m having a torn page.  What I’m, supposed to do is recover the database.  I’ve done that and I continue to get the error.  If my understanding of a torn page is correct, the error occurred on the write of the data and would not be detected until the page is actually read again. If the page is not read for a while the error could have indeed occurred a while, more than the last backup, ago.  I’ve gone back a month of restores and the problem still persists.  

This seems suspicious to me.  If I understand how the users of the database actually use the data, I don’t think the problem could have festered that long without being read. But then again the user may not be using the data how I think they are using it.  Anyway, after a restore I do a DBCC CHECKDB  and CHECKTABLE WITH PHYSICAL_ONLY  and nothing is found.  The doc say torn pages should be detected.  

Question 1.  How accurate is it to say that DBCC CHECKDB will detect torn pages.  Is the online doc correct?

The database seems to run fine until I attempt to do the following from a nightly maintenance job.

delete from incoming where date_time < (getdate() - 45)

I receive the 823 error when the job executes.  If I issue

Select * from incoming where date_time < (getdate() - 45)

It returns rows and makes me scratch my head in wonderment because it appears to there is no torn page on a select * but there is on a delete.

Question 2.  Could someone explain why this may be the case?

Question 3.  If this is an active physical intermittent device problem where would I go to detect an actual device issue?

Question 4. Since I’ve exhausted my month of nightly backups and the problem still occurs, could a way around this problem be to

1 turn torn page detection off,
2 create a temp table
3 copy into the temp table all the data that is more recent then 45 days.
4 drop and recreate the old table
5 move the data from the temp table to the new table.

I realize some data may be lost, but I think I will have been able to retain some of the most recent data which will be used in users report generation.

Thanks for your time and effort.

Dan H.
Question by:dhully
  • 3
  • 2
  • 2
  • +1
LVL 15

Expert Comment

ID: 13760859
Hi dhully,

Please can you post the full error message?
LVL 15

Assisted Solution

mcmonap earned 800 total points
ID: 13760930

Author Comment

ID: 13761127
error messages is as follows.

I/O error (torn page) detected during read at offset 0x000009162f2000 in file 'D:\Websense Reporter\wslogdb50.mdf'..

followed by

Error while redoing logged operation in database 'wslogdb50'. Error at log record ID (295960:94705:141)..

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.


Author Comment

ID: 13761288
We do have compaq equipment and cacheing is turned on SP3a applied.  Will work on getting cahcing turned off.

 Currently I had the strange notion that the torn page may have been on an index built on the date time column of the incoming table and not the table itself.  Just a strange though I had in a reduced sleep state.  Running the job with the index dropped. Takes a while with over 135 million rows.  
LVL 27

Expert Comment

ID: 13762502
A torn page indicates an incomplete I/O operation, maybe caused by power failures or bit operations. If torn page detection is ON, then a bit is flipped for each 512-byte sector of an 8KB page when the page is written to disk.

(Maybe the reason why the SELECT does not cause the error and the DELETE does - a SELECT does not write to the disk)

If the bit is in the wrong state when the page is later read by SQL Server, the page was written incorrectly. Disks perform I/0 operations in 512-byte sectors (16 sectors to a page). If the system crashes or a defective disk driver can cause a glitch between the time the operating system writes the first 512-byte sector to disk and the completion of the 8KB I/0 operation. This corrupts the data page, and makes your entire database corrupt. There is no way to fix a torn page except by restoring to a known good backup (because it is physically inconsistent).

LVL 25

Expert Comment

ID: 13763179
If the problem is an index, though, you should be able to drop the index and rebuild.
LVL 25

Accepted Solution

jrb1 earned 600 total points
ID: 13763221
I also read this:

Torn page in non-clustered indexes will be deleted when the index is
rebuilt under any of the repair options. Any other kind of torn page can
only be fixed with REPAIR_ALLOW_DATA_LOSS.

You should move to new hardware and restore from a backup before considering
running repair (as your last resort).
LVL 27

Assisted Solution

ptjcb earned 600 total points
ID: 13765845
Before running DBCC CHECKTABLE with the REPAIR_ALLOW_DATA_LOSS, I agree with jrb1, move the data.

Run DBCC CHECKDB to determine all inconsistencies.
RUN DBCC CHECKCATALOG to see if there are any larger issues related to the inconsistency.
RUN DBCC CHECKTABLE to determine the inconsistency on the specific table
Create another table with the same structure and add _NEW
Run a SELECT statement with the WHERE clause to determine what can be read.
INSERT SELECT the data into the _New table
RUN DBCC CHECKTABLE to verify the table is error free
Attempt to bring the remainder from a previous backup
Backup the db
Rename the original table _OLD
Rename the copy to drop the _NEW suffix
Drop _OLD
Test the App
Run DBCC CHECKDB to ensure consistency errors have been taken care of
If this does not solve the problem, then run DBCC CHECKDB with REPAIR_ALLOW_DATA_LOSS, but expect to lose the data originally indicated in the DBCC CHECKDB statement plus the potential for any linked data.

Author Comment

ID: 13771522
As of this moment we are back and running.  

1.      I used a backup of the previous night before the torn page message.  
2.      Drop and rebuilt the index on the date_column of incoming table.
3.      Ran the delete job multiple times deleting fewer rows of data at the each time as opposed to all the rows that needed to be deleted at the same time.

I realize its bad practice to make two changes at the same time but the users and management were getting restless.  (Can’t blame them.)

So it would appear there was a bad index or the controller cache was getting flooded.

Thanks for your time and effort


Featured Post

How to Use the Help Bell

Need to boost the visibility of your question for solutions? Use the Experts Exchange Help Bell to confirm priority levels and contact subject-matter experts for question attention.  Check out this how-to article for more information.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Backups and Disaster RecoveryIn this post, we’ll look at strategies for backups and disaster recovery.
MSSQL DB-maintenance also needs implementation of multiple activities. However, unprecedented errors can hamper the database management. In that case, deploying Stellar SQL Database Toolkit ensures fast and accurate database and backup repair as wel…
In this video, Percona Solution Engineer Dimitri Vanoverbeke discusses why you want to use at least three nodes in a database cluster. To discuss how Percona Consulting can help with your design and architecture needs for your database and infras…
In this video, Percona Solutions Engineer Barrett Chambers discusses some of the basic syntax differences between MySQL and MongoDB. To learn more check out our webinar on MongoDB administration for MySQL DBA: https://www.percona.com/resources/we…
Suggested Courses

864 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question