We help IT Professionals succeed at work.

We've partnered with Certified Experts, Carl Webster and Richard Faulkner, to bring you two Citrix podcasts. Learn about 2020 trends and get answers to your biggest Citrix questions!Listen Now

x

Database Corruption in MS Exchange 2007 Databases...

DWTCIT
DWTCIT asked
on
Medium Priority
2,517 Views
Last Modified: 2012-05-06
Hi,

We are running a two node clustered mailbox configuration on Windows 2003 EE x64 SP2, MS Exchange 2007 SP1. We noticed that in the last week, several of our databases are getting corrupted we are having serious downtime on our email. The error which seems to be coming up in the Application Log most often is :

Event Type:      Error
Event Source:      ESE
Event Category:      Database Corruption
Event ID:      467
Date:            2/16/2009
Time:            10:57:43 AM
User:            N/A
Computer:      DWTCEXCMB2
Description:
MSExchangeIS (3560) Third Storage Group: Database U:\Third Storage Group\Mailbox Database 8.edb: Index Content Indexing Property Store Deleted Items index of table I-3-1 is corrupted (0).

For more information, click http://www.microsoft.com/contentredirect.asp.


We have 5 Storage Groups with more than 16 databases - total approximate size of Exchange DBs is about 1.4TB (yes it is huuuuge!).

We are repairing these databases with ESEUtil and we are able to recover them but it takes several hours and after that they get corrupted again or other databases get corrupted.

Any advice on where we should be looking to figure out this error message. There are hundreds of these messages in the event viewer.

Comment
Watch Question

SurajSenior System Engineer
CERTIFIED EXPERT

Commented:
Try to Follow this Microsoft Article...
http://support.microsoft.com/kb/329817

There are two ways to resolve this issue :
1)
we have to run ISISNTEG which will
help to remove logical corruption from database

2) create a new store and move all the maliboxes to that store.. and then discard this store...
The reason i said this is ... when we move mail box to a diff store.. it leaves behind all the corruption.
So the mailboxes which are moved would be perfectly clean with out corruption...

-x

Author

Commented:
Dear x-sam,

Thanks for your reply. We have tried both options separately and have had corruptions in same (and other databases) after these tasks were completed successfully.

Thanks for your help.
SurajSenior System Engineer
CERTIFIED EXPERT

Commented:

Perfect.. so is the issue resolved or u stil have any problem?

Author

Commented:
Hi x-sam,

No the problem is still there - the corruptions keep coming back after we completed the tasks you mentioned. We had tried these about two nights ago and the database was fine (clean) but the errors returned and db is now corrupted again :-(

Any other ideas?

Commented:
Sounds like a hardware (disk) or third party issue.

Do you have any file level anti-virus scanning the database and log files?
Look in the system log for disk related errors
Get a health check done by your disk vendor to see if they are functioning properly.
SurajSenior System Engineer
CERTIFIED EXPERT

Commented:
check the exclusions on the anti virus...it should exclude exchsrc folder, inetsrc, inetpub..
check if you are getting any 327 or any other errors like 447 etc.... try changing the location of the database to a different drive...

Commented:
ISinteg typically does not resolve the ESE corruption errors like this, because it is at the physical level (the 4kb pages), which can only be handled by ESEutil.

If the problem is repeating, or spreading, it is most likely as previously mentioned, a  hardware problem. These errors happen when there is a bad checksum detected in one or more of the database pages. I.E. A checksum is written to each 4kb page, and to it's spot on the disk. (which should match) If the hardware mis-writes this, or reads it improperly, it will be considered a bad checksum, or corrupted. You get several warnings because we attempt 16 to 18 read attempts, then give a final ESE error.

If you can avoid repairing the same databases over and over, it is in your best interest because you will eventually have a useless database, unless you are doing the offline defrag after each repair, which effectively gives you a new database. I would try to restore from backup if possible when the corruption occurs. So basically, get a good backup of each database (after repairing, or restoring from a previous good backup), then if the corruption occurs, restore from backup and let it play through logs. This is much better than going through ESEUTIL /P, then ISINTEG -FIX, then ESEUTIL /D every time, on every database.

However, managing the databases like this is only a temporary fix, you need to run some serious hardware diagnostics to find out what keeps causing the bad checksums, or the misread checksums in the databases. The common causes are memory, bad disks, firmware updates, etc.

Also, as mentioned, antivirus file level scanning can cause this, but is not as common as hardware causing it. This will keep happening over and over until you find the hardware problem and/or eliminate the antivirus.

Author

Commented:
Dear Kevala,

Your comments were very useful. We have cross checked the storage and storage controller logs and found no i/o related errors which is why we are looking for logical problems in the database.

Another strange happening is that once a particular database gets corrupted (Engineering DB) it goes offline and brings down all other databases as well. After this, Engineering does not come online without a recovery/restore but sometimes the other ones do. If this happens a few times, then the other databases get corrupted as well and require a recover/restore.

I have found the following error in the Engineering DB :

Event Type:      Error
Event Source:      EXCDO
Event Category:      General
Event ID:      8199
Date:            2/17/2009
Time:            8:02:11 AM
User:            N/A
Computer:      DWTCEXCMB2
Description:
Calendaring agent failed in message save notification with error 0x800703eb on user.name@domain.com: /Calendar/Emergency Procedures and Major Incident Planning.EML.


I found that this particular user (user.name) has one calendar occurrence which exceeded the 1300 limit mentioned in :

http://support.microsoft.com/kb/943371/

For the time being, I am restoring the Engineering DB and this user is in that database. I will try to bring up the DB and then remove the 1300 calendar occurrence and see if the above error disappears. Right now it is a case of eliminating the obvious errors.

Will keep you posted.

Thanks for your help.
SurajSenior System Engineer
CERTIFIED EXPERT

Commented:
The error 8199 comes only if some mailboxes have more than 1,300 recurring appointments. EXCDO has a limit of 1,300 recurring appointments.To resolve this problem in Exchange Server 2007 Service Pack 1, install Update Rollup 6 for Exchange Server 2007 Service Pack 1. it was already resolved by rollup1 but still upgrade it to rollup 6 which is just released .

Commented:
The reason 1 database going offline is affecting all; is because information about all stores is kept in the log files. So basically, when you try to being one of the stores online, it does a soft recovery (replay of the logs) and even though you are trying to mount "mailbox store 1", it will error/fail referencing "mailbox store 3"...

If this happens again:
1. Move out the corrupted database.edb file
2. Open a command prompt and switch to the Exchange "bin" folder
3. Run the following:    "Eseutil /r Exx /i"

NOTE:  Exx is the prefix of the log for the storage group. Like E00, E01, etc.
NOTE2:  the "/i" in the eseutil command tells eseutil to ignore missing database files

This way you can at least try to get the other stores up while you recover the corrupted one.

However, there is still something going on.. Hardware, or 3rd party software related.
I would seriously consider the thought of removing ALL 3rd party applications, and calling the hardware vendor in for some assistance.

NOTE3: If you remove AV, just make sure you have the mail scanning/filtering through another server or host first.

Author

Commented:
Okay - thank you for all your replies.

After consulting with Microsoft for more than 30 hours - they recommended that we put the database in recovery mode and create a fresh dial-tone database. After this, we allowed the users to connect to the fresh database (with no e-mails, calendars etc) and then we began the process of merging the corrupted database with the new one. This process took several hours and the corrupted data in the DB was left behind.

Essentially all the errors have disappeared from the event logs on the MB server but now we are facing some new challenges. The most major one is that there are some users who can not see any of their rules or calendar entries in their newly created mailbox but their e-mails have been restored. Another related issue is that when they receive a calendar appointment - they are unable to open it or dismiss it. Error message is "Cannot turn off the reminder. You may be reminded again".

Any ideas?
Senior System Engineer
CERTIFIED EXPERT
Commented:
Ask those users to create a new Outlook Profile and test the same....
OR
I guess you will have to run the ISINTEG command on the database... this will remore the logical corruption in it....

can you tell me with whom did you work in Microsoft... [His name] if you dont mind...

Not the solution you were looking for? Getting a personalized solution is easy.

Ask the Experts
Access more of Experts Exchange with a free account
Thanks for using Experts Exchange.

Create a free account to continue.

Limited access with a free account allows you to:

  • View three pieces of content (articles, solutions, posts, and videos)
  • Ask the experts questions (counted toward content limit)
  • Customize your dashboard and profile

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.