?
Solved

Database Corruption in MS Exchange 2007 Databases...

Posted on 2009-02-15
12
Medium Priority
?
2,372 Views
Last Modified: 2012-05-06
Hi,

We are running a two node clustered mailbox configuration on Windows 2003 EE x64 SP2, MS Exchange 2007 SP1. We noticed that in the last week, several of our databases are getting corrupted we are having serious downtime on our email. The error which seems to be coming up in the Application Log most often is :

Event Type:      Error
Event Source:      ESE
Event Category:      Database Corruption
Event ID:      467
Date:            2/16/2009
Time:            10:57:43 AM
User:            N/A
Computer:      DWTCEXCMB2
Description:
MSExchangeIS (3560) Third Storage Group: Database U:\Third Storage Group\Mailbox Database 8.edb: Index Content Indexing Property Store Deleted Items index of table I-3-1 is corrupted (0).

For more information, click http://www.microsoft.com/contentredirect.asp.


We have 5 Storage Groups with more than 16 databases - total approximate size of Exchange DBs is about 1.4TB (yes it is huuuuge!).

We are repairing these databases with ESEUtil and we are able to recover them but it takes several hours and after that they get corrupted again or other databases get corrupted.

Any advice on where we should be looking to figure out this error message. There are hundreds of these messages in the event viewer.

0
Comment
Question by:DWTCIT
  • 5
  • 4
  • 2
  • +1
12 Comments
 
LVL 17

Expert Comment

by:Suraj
ID: 23647545
Try to Follow this Microsoft Article...
http://support.microsoft.com/kb/329817

There are two ways to resolve this issue :
1)
we have to run ISISNTEG which will
help to remove logical corruption from database

2) create a new store and move all the maliboxes to that store.. and then discard this store...
The reason i said this is ... when we move mail box to a diff store.. it leaves behind all the corruption.
So the mailboxes which are moved would be perfectly clean with out corruption...

-x

0
 

Author Comment

by:DWTCIT
ID: 23647574
Dear x-sam,

Thanks for your reply. We have tried both options separately and have had corruptions in same (and other databases) after these tasks were completed successfully.

Thanks for your help.
0
 
LVL 17

Expert Comment

by:Suraj
ID: 23647727

Perfect.. so is the issue resolved or u stil have any problem?
0
NFR key for Veeam Agent for Linux

Veeam is happy to provide a free NFR license for one year.  It allows for the non‑production use and valid for five workstations and two servers. Veeam Agent for Linux is a simple backup tool for your Linux installations, both on‑premises and in the public cloud.

 

Author Comment

by:DWTCIT
ID: 23647795
Hi x-sam,

No the problem is still there - the corruptions keep coming back after we completed the tasks you mentioned. We had tried these about two nights ago and the database was fine (clean) but the errors returned and db is now corrupted again :-(

Any other ideas?
0
 
LVL 9

Expert Comment

by:abdulzis
ID: 23647831
Sounds like a hardware (disk) or third party issue.

Do you have any file level anti-virus scanning the database and log files?
Look in the system log for disk related errors
Get a health check done by your disk vendor to see if they are functioning properly.
0
 
LVL 17

Expert Comment

by:Suraj
ID: 23647934
check the exclusions on the anti virus...it should exclude exchsrc folder, inetsrc, inetpub..
check if you are getting any 327 or any other errors like 447 etc.... try changing the location of the database to a different drive...
0
 
LVL 10

Expert Comment

by:kevala
ID: 23653765
ISinteg typically does not resolve the ESE corruption errors like this, because it is at the physical level (the 4kb pages), which can only be handled by ESEutil.

If the problem is repeating, or spreading, it is most likely as previously mentioned, a  hardware problem. These errors happen when there is a bad checksum detected in one or more of the database pages. I.E. A checksum is written to each 4kb page, and to it's spot on the disk. (which should match) If the hardware mis-writes this, or reads it improperly, it will be considered a bad checksum, or corrupted. You get several warnings because we attempt 16 to 18 read attempts, then give a final ESE error.

If you can avoid repairing the same databases over and over, it is in your best interest because you will eventually have a useless database, unless you are doing the offline defrag after each repair, which effectively gives you a new database. I would try to restore from backup if possible when the corruption occurs. So basically, get a good backup of each database (after repairing, or restoring from a previous good backup), then if the corruption occurs, restore from backup and let it play through logs. This is much better than going through ESEUTIL /P, then ISINTEG -FIX, then ESEUTIL /D every time, on every database.

However, managing the databases like this is only a temporary fix, you need to run some serious hardware diagnostics to find out what keeps causing the bad checksums, or the misread checksums in the databases. The common causes are memory, bad disks, firmware updates, etc.

Also, as mentioned, antivirus file level scanning can cause this, but is not as common as hardware causing it. This will keep happening over and over until you find the hardware problem and/or eliminate the antivirus.
0
 

Author Comment

by:DWTCIT
ID: 23657841
Dear Kevala,

Your comments were very useful. We have cross checked the storage and storage controller logs and found no i/o related errors which is why we are looking for logical problems in the database.

Another strange happening is that once a particular database gets corrupted (Engineering DB) it goes offline and brings down all other databases as well. After this, Engineering does not come online without a recovery/restore but sometimes the other ones do. If this happens a few times, then the other databases get corrupted as well and require a recover/restore.

I have found the following error in the Engineering DB :

Event Type:      Error
Event Source:      EXCDO
Event Category:      General
Event ID:      8199
Date:            2/17/2009
Time:            8:02:11 AM
User:            N/A
Computer:      DWTCEXCMB2
Description:
Calendaring agent failed in message save notification with error 0x800703eb on user.name@domain.com: /Calendar/Emergency Procedures and Major Incident Planning.EML.


I found that this particular user (user.name) has one calendar occurrence which exceeded the 1300 limit mentioned in :

http://support.microsoft.com/kb/943371/

For the time being, I am restoring the Engineering DB and this user is in that database. I will try to bring up the DB and then remove the 1300 calendar occurrence and see if the above error disappears. Right now it is a case of eliminating the obvious errors.

Will keep you posted.

Thanks for your help.
0
 
LVL 17

Expert Comment

by:Suraj
ID: 23658632
The error 8199 comes only if some mailboxes have more than 1,300 recurring appointments. EXCDO has a limit of 1,300 recurring appointments.To resolve this problem in Exchange Server 2007 Service Pack 1, install Update Rollup 6 for Exchange Server 2007 Service Pack 1. it was already resolved by rollup1 but still upgrade it to rollup 6 which is just released .
0
 
LVL 10

Expert Comment

by:kevala
ID: 23662565
The reason 1 database going offline is affecting all; is because information about all stores is kept in the log files. So basically, when you try to being one of the stores online, it does a soft recovery (replay of the logs) and even though you are trying to mount "mailbox store 1", it will error/fail referencing "mailbox store 3"...

If this happens again:
1. Move out the corrupted database.edb file
2. Open a command prompt and switch to the Exchange "bin" folder
3. Run the following:    "Eseutil /r Exx /i"

NOTE:  Exx is the prefix of the log for the storage group. Like E00, E01, etc.
NOTE2:  the "/i" in the eseutil command tells eseutil to ignore missing database files

This way you can at least try to get the other stores up while you recover the corrupted one.

However, there is still something going on.. Hardware, or 3rd party software related.
I would seriously consider the thought of removing ALL 3rd party applications, and calling the hardware vendor in for some assistance.

NOTE3: If you remove AV, just make sure you have the mail scanning/filtering through another server or host first.
0
 

Author Comment

by:DWTCIT
ID: 23680738
Okay - thank you for all your replies.

After consulting with Microsoft for more than 30 hours - they recommended that we put the database in recovery mode and create a fresh dial-tone database. After this, we allowed the users to connect to the fresh database (with no e-mails, calendars etc) and then we began the process of merging the corrupted database with the new one. This process took several hours and the corrupted data in the DB was left behind.

Essentially all the errors have disappeared from the event logs on the MB server but now we are facing some new challenges. The most major one is that there are some users who can not see any of their rules or calendar entries in their newly created mailbox but their e-mails have been restored. Another related issue is that when they receive a calendar appointment - they are unable to open it or dismiss it. Error message is "Cannot turn off the reminder. You may be reminded again".

Any ideas?
0
 
LVL 17

Accepted Solution

by:
Suraj earned 2000 total points
ID: 23688777
Ask those users to create a new Outlook Profile and test the same....
OR
I guess you will have to run the ISINTEG command on the database... this will remore the logical corruption in it....

can you tell me with whom did you work in Microsoft... [His name] if you dont mind...
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

There can be many situations demanding the conversion of Outlook OST files to PST format and as such, there is no shortage of automated tools to perform this conversion. However, what makes Stellar OST to PST converter stand above the rest? Let us e…
Exchange administrators are always vigilant about Exchange crashes and disasters that are possible any time. It is quite essential to identify the symptoms of a possible Exchange issue and be prepared with a proper recovery plan. There are multiple…
In this video we show how to create a mailbox database in Exchange 2013. We show this process by using the Exchange Admin Center. Log into Exchange Admin Center.: First we need to log into the Exchange Admin Center. Navigate to the Servers >> Data…
Exchange organizations may use the Journaling Agent of the Transport Service to archive messages going through Exchange. However, if the Transport Service is integrated with some email content management application (such as an antispam), the admini…
Suggested Courses
Course of the Month14 days, 18 hours left to enroll

839 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question