Database Corruption Error: A Bad Page Link in B-Tree - HELP!

Today I had a couple users complain that some customers were getting "Delivery Status Notification" Delay errors.  I took a look at my Event Viewer (application) and noticed that it was completely FULL of errors.  The Errors are Source: ESE, Catagory: Database Corruption, and EventID: 447. The description is as follows:

Information Store (3832) First Storage Group: A bad page link (error -338) has been detected in a B-Tree (ObjectID: 18604, PgnoRoot: 3962053) of database E:\exchsrvr\mdbdata\priv1.edb (3962053 => 3962060, 2781903).

I clicked the link at the bottom of the error and Microsoft says:

"Explanation
A corrupted page link was detected in a B-Tree.
Available space in the Exchange Server Information Store database is in the form of a list of pages that can be used to store new data. The available space is called a space tree. The space tree is held as a binary tree that is searched whenever a block of new data needs to be added to the database.

During an online defragmentation, a problem was discovered in this tree.
This is often caused by a hardware failure or anti-virus scanning of the database file directory.

User Action
If you receive these errors in your application log, it is suggested that you restore from an online backup as soon as you can. A bad page link error signifies logical corruption at the Jet level in the database, and it is not safe to continue using the database. NOTE: The restore must be performed using a backup before this error occurred. In a worst case scenario, a hard repair followed by an isinteg -fix and then an ExMerge of the database may be required if no good backup exists. If you need help with a hard repair followed by an isinteg -fix and and then an ExMerge or if the problem persists, contact Microsoft Product Support Services. "

I am not sure what to do about this, and am wondering if anyone can tell me what is going on.  Sounds like a bad (physically bad) hard drive, or hard drive sector maybe?

Thanks.
Jeff
LVL 1
jbobstAsked:
Who is Participating?
 
Exchange_AdminConnect With a Mentor Commented:
Check out this link:
http://support.microsoft.com/kb/810190/en-us

You may have a SCSI controller that is acting up or you may have a hard drive going bad.
These type of errors are 99% of the time related to hardware issues. If you restore ro repair the databases and do not fix the hardware issue then the problem will more than likely come back.

Check the system log to see if there are any controller or hard drive errors reported.

0
 
jbobstAuthor Commented:
There are no errors about any disk issues...the last time I restarted this server was on May 24th, and the Hard Drive Controller reported the RAID1 arrays as "optimal".  My server is configured with 4 SATA hard drives...two drives are mirrored, or set to RAID 1, and the other two are mirrored as well.  Are there any tools that you can think of off hand that would diagnose my hard drives or the SATA RAID controller?

The particular server is an HP Proliant ML 150...I'll check the HP site and see if they have any tools.  The strange thing is, I have a nearly identical server for my Domain controller/file server.  The server is identical, with the exception of it only having a single processor, a little bit less memory, and larger hard drives, as compared to the Exchange server in question here.  I say funny, because about two weeks ago, during a server reboot, my Domain controller did report that one of the RAID 1 member drives was bad.  I ordered a replacement hard drive, and downloaded a utility from HP that I had to burn to a CD and boot the server from.  This utility was supposed to run diagnostics on the server.  I ran all the diagnostics from the CD, but it reported that everything was fine.  When I rebooted the server, the SATA RAID Bios then reported that the hard drives were good again!?!  So, I just left everything alone and didn't replace the hard drive.  Keep in mind, this was all performed on my domain controller server, not the exchange server that this original question is referring too.  Both servers are the exact same age (a little over a year old) so maybe the hard drives are starting to go bad???  However, I have no idea which drives are the bad drives, if this is truly the case.  If the server is reporting everything as optimal, I just don't know what else to do...

Jeff
0
 
Exchange_AdminCommented:
You have to keep in mind that with Exchange there is alot of I/O disk action.

"Are there any tools that you can think of off hand that would diagnose my hard drives or the SATA RAID controller?"
I would suggest that you check with the hard drive and the controller manufacturer.
In my past experiences it took utilities from the manuf. to correctly see the issue. Many times standard Windows diagnostics did not detect a problem.
0
 
jbobstAuthor Commented:
Exchange Admin,

Sorry for the delay in following up with this thread...

I ran a server diagnostic tool that HP suggested, but the server passed all the tests.  It was a quick test, and there was no way it physically scanned each hard drive, so I don't put much stock into the diag tool.  The raid controller card reports that the raid array and hard drives are all functioning "optimally", so I really don't know what else to do to see if there is physical damage to any of my hard drives.

I was able to test out the repair proceedure that Microsoft recommends on one of my backup test servers.  I have an identical server to my production server, and I restored the production server image to this server (we have a Symantec software product that created a complete image of the server and writes it to an external hard drive each night...sort of like Ghost, but this is their Live State Recovery product).  Anyway, I ran the repair utility that microsoft recommends as a last resort to fix this error, and it fixed the problem on the test server.  I let the test server run for a few days, and the error never came back.  However, since this test server's purpose in life is to just sit in a closest, waiting for the day the production server dies, it is practically a brand new server with hard drives that only have a few hours of use on them.  So, the odds are that the hard drives are physically in perfect shape, and that may be why the error hasn't returned.

So, my plan is to come in on a weekend, after our production server goes through a complete backup and imaging process, and try the fix microsoft recommends.  If the disks are fine, and it's just a database corruption, hopefully it will get fixed.  If the fix claims that it was successfull, but the error message remains, I suppose I'll just have to start replacing the raid disks one at a time, and rebuild the arrays until the error message goes away.

Any other thoughts?

Thanks.
Jeff
0
 
jbobstAuthor Commented:
I followed KB Article 810190 and it seemed to fix my exchange server last night.  All my backups were backed up when the error was happening, so I couldn't use any old backups.  I did the eseutil fixes, along with the isinteg fix and all seems to be working fine now.  No errors.

Thanks to everyone for their help.
Jeff
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.