MS Exchange Server - Corrupt data

cam-e
cam-e used Ask the Experts™
on
I have an Exchange 2010 server running on a x64 2008 R2 box which recently had a failed hard drive on a Raid 5. After replacing, the virtual disk is showing errors with OpenManage stating "The Virtual Disk has bad blocks. For more details, see the Virtual Disk Bad Block Management section in the Online Help."

This is the virtual disk that has the exchange database on it. One user's mailbox kept disconnecting it and after trying to re-load the profile on the individual PC the following errors come up in the log:

Log Name:      Application
Source:        ESE
Event ID:      481
Description:
Information Store (1996) Mailbox Database 0133899319: An attempt to read from the file "D:\Program Files\Microsoft\Exchange\V14\Mailbox\Mailbox Database 0133899319\Mailbox Database 0133899319.edb" at offset 50459377664 (0x0000000bbf9d0000) for 32768 (0x00008000) bytes failed after 0 seconds with system error 1 (0x00000001): "Incorrect function. ".  The read operation will fail with error -1022 (0xfffffc02).  If this error persists then the file may be damaged and may need to be restored from a previous backup.


Log Name:      Application
Source:        MSExchangeIS
Event ID:      1159
Description:
Database error Disk IO error occurred in function JTAB_BASE::EcSeek while accessing the database "Mailbox Database 0133899319".

Log Name:      Application
Source:        ExchangeStoreDB
Event ID:      203
Description:
At '8/3/2011 11:21:10 AM' database copy 'Mailbox Database 0133899319' on this server appears to have an I/O error that it may be able to repair.  To help identify the failure, consult the Event log on the server for other storage and "ExchangeStoreDb" events. Service recovery was attempted by failover to another copy. The failover was unsuccessful in restoring the service because of the following error: 'There is only one copy of this mailbox database (Mailbox Database 0133899319). Automatic recovery is not available..

The back-ups of exchange are reported as successful but on closer inspection report that it failed to backup the edb file. Currently, the EDB file is 80GB and the compressed backup is around 30GB.

Not sure what to do first here and always thought you should not run chkdsk on a volume with exchange?

Many thanks.
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Commented:
Hi,

To be honest, you may need to liaise with Dell. It may appear that your RAID or HDD firmware is out of date, hence the corruption. If you had a degraded array and a replacement drive is restored, your array will rebuild and should be fine. However, if you do manage to have a stable subsystem, Dell will not assist on your Exchange issue.

I would recommend focussing on your server drive subsystem first, install the latest version of Open Manage, which will highlight out of date firmware. Check the Dell website and get your server up to date on a hardware level too.

Once all that is sorted you may need to repair your Exchange database, depending on how bad the damage is. Finally, CHKDSK is a no no at this stage.

Hope this helps.

Author

Commented:
Thanks for the reply.

Server is all up to date with open manage and all drivers etc. Would running the repair utilities be of any use? could export all mailboxes to pst's first. There is around 35 or so.
Seelan NaidooMicrosoft Systems Admin

Commented:
please post Server Hardware details

Eg: Dell PE Server, RAID 5 setup, hotspares if any, how many disks in the raid, firmware versions of the disks

How was the disk replaced, straight hot swop?

RAID 5 status: might still be rebuiding

Author

Commented:
Sure,

Dell R710 2 Xeon's, 8GB, Server 2008 R2 x64

PERC H700 Integrated
-Firmware Version 12.10.1-0001
-Driver Version 4.31.01.64
-Storport Driver Version 6.1.7601.17577

Virtual Disk 0 -Raid 1- 2 disks
Virtual Disk 1 - Raid 5 - 4 disks (

It was hot swapped over a week ago, cant be certain when exchange problem first started.

Error in Open Manage on Virtual Disk 1 - "The Virtual Disk has bad blocks. For more details, see the Virtual Disk Bad Block Management section in the Online Help."

Commented:
During a RAID 5 rebuild, you should not get any issues other than a performance loss, or if the firmware is out of date, whereby you have been notified on the Dell website about such issue.

The last time I saw an issue like this was due to a firmware being out of date, when a disk died it made the whole array become offline. After reimporting the foreign array, the OS mentioned that it needed to perform a CHKDSK, which it did and everything was okay.

You haven't mentioned the firmware of your hard drives either?

I would liaise with Dell, assuming you have Pro Support who will help you on the RAID and hard drive aspect. Do not just run any repair utilities, as you could find yourself with a non-bootable system. If you don't know what you're doing there is a chance you could break it.

Author

Commented:
The old disks are HS10, the new disk is EH02

Commented:
Get your drives up to date, firmware is available on Dell's website.

Author

Commented:
Updated all the HS10 to HS11. IOne of them, however, didnt detect. Still trying to find the right firmware for the EH02 part number: ST3300657SS-H which i cant find on Seagate website or Dell.

Author

Commented:
Sorry - after a reboot all the HS10's are now showing HS11.

Author

Commented:
Also, this is from the documentation regarding the current disk state:


RAID Level Virtual Disk: RAID 5
State: Ready
Scenario: One bad block on two physical disks at the same location.
Result: The controller cannot regenerate data from peer disks. This results in a virtual disk bad block.

Will speak with Dell but assuming this is the case, what is the best way to restore the EDB?

Commented:
Are these databases are part of DAG members? If yes, you can switch over and activate Passive copy of the DB. Making sure that passive copy is not having any issues with Disk.

Author

Commented:
No, they arent unfortunately.

Commented:
If you're able to do so, try to export all mailboxes to PSTs, just incase. Don't forget your public folders too.

Author

Commented:
Exporting them all now. Do you think trying to defrag it could be an option?

Have also sent all the logs of the hardware to dell for inspection.
Seelan NaidooMicrosoft Systems Admin

Commented:
most questions answered since my last post..

Commented:
Defrag won't help in this situation, you have an underlying disk problem not a fragmentation problem.

Commented:
Agree with Netflo, and I would definitely NOT do a defrag nor repair on the EDB until you get the underlying disk issue resolved, else you will more then likely just do more damage to the exchange database.  

After exporting the mailboxes and public folders to PST (if some of them failed to export all data) I would suggest that you will want to copy that database to an alternate location (Different machine or External HD) so that you can use it to recover the remaining data if the raid array update and fix up doesn't take  

Author

Commented:
Dell replaced the Raid Controller and the back plane enclosure. Disk still has bad blocks which im pretty sure is where some of the EDB file is.. I unmounted the database and tried to copy it to the NAS. Got to 60% (40GB) and it fails there. Was able to re-mount the database fine but Exchange still complaining about the read errors. When i exported the mailbox tol PST's files it only complained about one user mailbox with skipped messages.
Commented:
OK so sounds like you are making some progress but things are still on edge eh?

You could attempt to repair the bad block but I would say the chance are pretty high that you very well may lose that database upon repair.  So some things to think about;

1. I assume new email is still flowing in and out?

2. If yes to # 1 then even though you exported the majority of the email you now have the chance of losing the date actions that occurred post export to PST so IMO you have a few options;

A. Kick everyone off the system and stop inbound mail-flow at a pre-determined time and do another export.  

* Upon completion, fix the bad block issue via Chkdsk and see if you get lucky.  
** If all goes well the DB will be intact and you can run New-MailboxRepairRequest http://technet.microsoft.com/en-us/library/ff625226.aspx
** if the database is trashed but the disk is not without issue you can do a dial-tone of the current DB (dismount the old DB, move or delete the appropriate files and then remount the database.  Exchange will tell you that the DB files don't exist and that if you continue it will create new files, say yes and a blank new DB will be created and users can start to send and receive again and then import the PST files to recovery the historical data)

B. While the above will work,  If by chance you have another disk system I would can create a new store on that alternate disk subsystem and then migrate the users into it.  This would be much more seamless to the users overall.  That said while not a necessity I would still consider cutting off access to the users as well as inbound traffic until you completed the migration just in case something else happens and then that way you have a good fallback with the PST's.

Author

Commented:
I have re-ran the dell diag. tool and sent to them for further analysis to make sure.

Did the export to the PST's just in-case something further went wrong. Definitely think that cutting the mail flow and doing another export to PST on the weekend is the way to go. This box isnt on SP1 for exchange so im assuming i would still use isinteg in place of this?

Commented:
No while ISinteg shipped with 2010 originally it doesn't work so hmm, yeah that makes things a bit more difficult then.  

So my recommendation would be that you do the delta export and then ensure you are up to date on SP1 and also install the latest Roll-ups as well  RU4-v2 is the last one.

Question:  Are you running a Blackberry server?  If so then you will need to ensure you get the updates from RIM that coincide with SP1 Ru3 else it will cause other issues.

Author

Commented:
Ah, thanks for the heads up on that one. Will patch that first and let you know how i get on.

Thank you all.

Author

Commented:
Apologies for the delay!

Was able to fix the desk using chkdsk but i am still unable to upgrade to SP1.

In trying to upgrade to SP1 i get these errors:


Error:
Setup needs to contact the Active Directory schema master but this computer is not in the same Active Directory site as the schema master (Default-First-Site-Name).
Click here for help... http://go.microsoft.com/fwlink/?linkid=30939&l=en&v=ExBPA.14&id=2376fec1-b9ce-44db-beb6-cb9ac4788988

Error:
Setup encountered a problem while validating the state of Active Directory: Exchange organization-level objects have not been created, and setup cannot create them because the local computer is not in the same domain and site as the schema master.  Run setup with the /prepareAD parameter on a computer in the domain tcs and site Default-First-Site-Name, and wait for replication to complete.
Click here for help... http://technet.microsoft.com/en-US/library/ms.exch.err.default(EXCHG.141).aspx?v=14.1.218.11&e=ms.exch.err.Ex28883C&l=0&cl=cp


Have ran dcdiag and get no errors so not sure. Replication throughout the domain appears to be working.

Author

Commented:
Transferring the schema master role to the EX2010 box allowed us to install SP1.

Commented:
Thanks for the update

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial