Solved

SBS2003 Exchange Database Corrupt and Log files filling hard drive.

Posted on 2011-09-07
17
826 Views
Last Modified: 2012-05-12
Hi,
I have curently taken over a SBS2003 server and have came across a big problem with their exchange database;

They use Veritas backupexec 10 for SBS but for some time they have only been backing up at brick level. The log files in MDBDATA have mounted up until they are nearly filling the hard drive.
I ran a full backup on the database but it failed to remove the log files and threw up the following errors;
___________________________________________________
Event Type:      Information
Event Source:      ESE
Event Category:      Logging/Recovery
Event ID:      225
Date:            05/09/2011
Time:            22:59:35
User:            N/A
Computer:      SERVER2003
Description:
Information Store (6168) First Storage Group: No log files can be truncated.  

For more information, click http://www.microsoft.com/contentredirect.asp.

___________________________________________________
Event Type:      Error
Event Source:      ESE
Event Category:      Logging/Recovery
Event ID:      217
Date:            05/09/2011
Time:            22:26:50
User:            N/A
Computer:      SERVER2003
Description:
Information Store (6168) First Storage Group: Error (-1018) during backup of a database (file C:\Program Files\Exchsrvr\mdbdata\priv1.edb). The database will be unable to restore.

For more information, click http://www.microsoft.com/contentredirect.asp.

___________________________________________________
Event Type:      Error
Event Source:      ESE
Event Category:      Logging/Recovery
Event ID:      474
Date:            05/09/2011
Time:            22:26:46
User:            N/A
Computer:      SERVER2003
Description:
Information Store (6168) First Storage Group: The database page read from the file "C:\Program Files\Exchsrvr\mdbdata\priv1.edb" at offset 9787383808 (0x00000002475fa000) (database page 2389497 (0x2475F9)) for 4096 (0x00001000) bytes failed verification due to a page checksum mismatch.  The expected checksum was 8080038126035932732 (0x70220fdda1fc163c) and the actual checksum was 3166078850964650706 (0x2bf02bf0ba9ffad2).  The read operation will fail with error -1018 (0xfffffc06).  If this condition persists then please restore the database from a previous backup.  This problem is likely due to faulty hardware. Please contact your hardware vendor for further assistance diagnosing the problem.

For more information, click http://www.microsoft.com/contentredirect.asp.
___________________________________________________
I tried using NTBACKUP also but that didn't work either. it stopped about 20% into the backup procedure.

I have looked into using ESEUTIL to do a repair, defrag and then ISINTEG but I have a few concerns.

Firstly, the space on the drive is too low, I believe you need about 20% of the database size (the database is 47Gb) to run the ESEUTIL repair. then at least 110% of the database size for the defrag.
Secondly, the time it will take. again I read it can be 1hour for 1gb. This means nearly two days for this database. Downtime they can ill-afford.
Finally will ESEUTIL/ISINTEG work for this issue. Can I simply copy the database files and if the process fails put them back. thenat least I can start from scratch.

I have though about doing the following;

Install a second temporary exchange server, move the mail boxes into a new mailbox store on that server.Delete the original corrupt database on SBS 2003, recreate a new one, then move the mailboxes back to the newly created mailbox store on the SBS2003.

or,

Export all mailboxes into PST files. Recreate a new mailbox store on the SBS2003 and import back in. Could be very long process.

or,

Can a copy of the database put on a diferent non exchange machine, which has more space and do the ESEUTIL/ISINTEG there? this would allow me to test the process (duration and success) and keep the original databse running during the test.

or,

Is there a way to remove the log files without affecting the database, then I can deal with the corruption with more HD space?

Any advice would be grateful.

Kind Regards,

Colin
0
Comment
Question by:VitalNetworkSolutions
  • 8
  • 8
17 Comments
 
LVL 8

Expert Comment

by:TalkExchange
ID: 36495344
there surely seems to be an issue of corruption.

First you need to make sure if the transaction log are growing too fast.

http://exchangeserverinfo.com/2007/06/23/fast-growing-databases--transaction-logs-on-exchange-2000--2003.aspx

then an different way as mentioned in

http://forums.devshed.com/mail-server-help-111/exchange-2007-transaction-logs-filling-up-hard-drive-625543.html


dismount the database and using eseutil make sure the database is in clean shut down.

Then move the logs files to temp location like different hdd, remove checksum files and mound the db.

worth a try...
0
 
LVL 17

Expert Comment

by:lucid8
ID: 36504551
if you are getting 1018 you have a very serious issue please read this http://support.microsoft.com/?kbid=314917

1. How long have these been going on?
2. Has anything changed recently, i.e. more memory, Disk, firmware?
3. How many users are involved and how big is the current database

STEP 1: You may want to consider stopping all mail flow and then Exporting each users mailbox to PST via Outlook or possibly by ExMerge, although ExMerge has a 2GB PST limit since it only supports ANSI PST and past 2GB it just corrupts so IMO better to export each one via Outlook and use the Unicode PST format which is the default format from Outlook 2003 and later.  This will give you a backup for each user right away so as to minimize the potential for data loss.

STEP 2:  Take the database offline ( all logs will be committed during the dismount) and backup/copy the Exchange EDB to an alternate location. This will at bare minimum preserve the current database so that if things get worse you can always use a 3rd party tool (do a Google Search for "export offline exchange mailbox database" )  to extract the data or repair a copy of the DB after fixing the cause and hopefully you will not experience much data-loss (more below)

STEP 2.a: once you have a backup of the EDB, STM and Log files to an alternate location, i.e. NOT ON THE SAME SERVER you can remount the database and temporarily turn on circular logging to truncate the logs http://support.microsoft.com/kb/314605 and then once the logs are cleaned up turn circular logging off.

STEP 3: How long has this been going on, i.e. 1018's are a very bad thing and in essence your database is becoming more and more corrupt the longer you run.  In short this can be caused by bad disk system hardware such as the controller, cables disks etc, firmware updates can cause this issue as well and at times memory gone bad or adding in new memory can have the same effect and once I saw a motherboard degrade to the point it caused 1018's.  So again HAS ANYTHING CHANGED RECENTLY OR AT ABOUT THE TIME the -1018s started to appear?  If so look into that item and correct it, i.e. roll back to previous item, replace item etc.

STEP 4: Once you figure out the issue and correct it you can then execute a repair of the database and hopefully you will not have a major loss but if its been happening for some time you more then likely will experience data loss, hence the reason for the PST backup above

NOTE: If you don't know how long its been going on and NOTHING has recently changed it could take a very long time to figure out what the core issue is.  If that's the case then IMO you are better off to build a new server on DIFFERENT hardware and migrate the mailboxes to that new server so that you are sure the 1018's are not going to arise again otherwise you are playing Russian roulette with your database staying on that server.

FYI if you have not completely solved the issue that caused the -1018 in the first place causing additional heavy I/O stress on your disk subsystem like a backup or executing a repair can cause more damage then good to the database so until you know what caused the issue I would be very careful about putting it under a heavy load.

You can certainly look at restoring from backup, however A: that doesn't solve the cause of the issue B: you will lose data and C: depending on when this started your backups are all suspect at this point, so IMHO not a good option

If you are successful is solving the root cause of the issue on the current server you will need to then do the following;

1. Run ESEUTIL /P http://technet.microsoft.com/en-us/library/aa998231(EXCHG.80).aspx
2. Then run ESEUTIL /D http://technet.microsoft.com/en-us/library/aa997972(EXCHG.80).aspx
3. Then run isinteg -s ServerName -fix -test alltests

More about ISINTEG
http://technet.microsoft.com/en-us/library/bb125144(EXCHG.80).aspx
http://support.microsoft.com/?kbid=301460

Again if you DO NOT know the cause of the 1018 I would NOT run any of the above commands on your system since they could very well cause more damage and instead I would migrate the mailboxes to a new server ASAP.

0
 

Author Comment

by:VitalNetworkSolutions
ID: 36532621
Hi lucid8,
Sorry fot the delay in answering. I exported the mailboxes using exmerge and to ensure the PST files did not get larger than 2gb I exported by date ranges. I checked the exmerge log files and everything exported sucessfully.
I think know what the problem was,they had problems with our UPS,router, and AV software. I have resloved these problems everything seems to be running OK.

My intention is to do the following;

1. Stop Mail Flow
2. Export the last emails from the last few days
3. Dismount Mailbox Store
4. My preference is to delete the database and log files. These are the priv1.edb, priv1.stm and all of the log files in the mdbdata folder. Keep the public store as this is OK.
5. Create a new mailbox store
6. Recreate all user mailboxes via AD Exchange tasks
7. Mount Database
8. Import All PST file using exmerge
9. Perform FULL backup using backup Exec.

I realise this is a long process, however if I use ESEUTIL and ISINTEG I could lose data and I will be impoting the data anyway.

Does this sound like a possible procedure and is there anything I have missed?

Thanks,
Col
0
 
LVL 17

Expert Comment

by:lucid8
ID: 36532801
Hello and no problem I figured you were busy!  Ok so;

A. Have you determined the cause of the -1018? If not and you create a new DB you will more than likely face this issue again so I cant stress enough how important it is to ensure you have the cause resolved before moving on

B. Assuming you have done A, then see my comments below

1. Stop Mail Flow L8: Agreed
2. Export the last emails from the last few days  L8: Agreed
3. Dismount Mailbox Store  L8: Agreed

4. My preference is to delete the database and log files. These are the priv1.edb, priv1.stm and all of the log files in the mdbdata folder.  L8: I would recommend that you copy files off to a secondary location in case you need them in the future and then delete them.

Keep the public store as this is OK. L8: OK

5. Create a new mailbox store L8: Actually do not do that, instead go into ESM and mount the old Exchange database you dismounted in step3.  Exchange will squawk saying that the database files are missing and that if you continue to do this it will create a blank database.  Say Yes and a new blank database of the same name will be created.  
 
6. Recreate all user mailboxes via AD Exchange tasks L8: Nope, you don't need to do that since the users and mailbox information is still within AD and when you re-mounted the store and created blank databases in step 5 they were automatically linked to the users, i.e. they have new fresh blank mailboxes and users can begin to send and receive items.

7. Import All PST file using ExMerge  L8: Agreed and you can do this while they are working

9. Perform FULL backup using backup Exec. L8: Agreed

Let me know if you have any questions
0
 

Author Comment

by:VitalNetworkSolutions
ID: 36553310
Hi Lucid,

Thanks for the above steps. They worked perfectly. Fantastic advice, which has definately saved me from great heartache.
I am currently importing the last few pst files, after which I will do a full backup. Half way thru' I decided to run a full backup to see the log files, that were generated with the import, purged from the system and confirm the database was ok, (at least up to that point).

Hope you don't mind, but one last question.
The majority of outlook clients run in cached mode. when they open their Outlook clients will the cached local mailbox sent their messages to the server, creating duplicate messages on the server?
If so, is there a workaround? I am currently linked to the exchange server remotely, so I'm not in the offices.Walking around the offices, turning off cache mode unfortunately is not an option. Also, remote workers with laptops will not be available to do this.

Any advice on this would be greatly appriecated.

Thanks and kind regards,
Col
0
 

Author Comment

by:VitalNetworkSolutions
ID: 36554085
Hi,
It didn't work. Imported everything sucessfully, but when i ran a full backup it failed with Event ID 474 and did not remove the logfiles.
I dismounted the database, left the log files that were generated during the import inplace and I have started a eseutil /p. I was able to gain about 30gb space on the hard drive (the databse is approx 45gb).
I think i may haveto try and gain more space for the defrag or will the log files be removed during the repair?

Regards,
Col
0
 
LVL 17

Expert Comment

by:lucid8
ID: 36554244

1.Regarding the cached mode question, the clients will each have an OST and there is a signature withing that OST that is linked to the original store so when they start outlook it will see that there is a new store and will complain and ask them if they want to connect to the server or the offline cache.  If they connect to the server then the old OST is overwritten with all the data from the server.  Only reason to select the offline cache is if you want to export all that information to PST at each users machine level but since the export from the online worked without issue then there is no reason to do so

2. I assume you mean that step 9 the backup didn't work?  

3. Why are you doing a repair against the new database?  

4. Repair is kind of a misnomer IMO when people think of ESEutil /P because it sounds like it will fix up any problems without data loss but /P is a last ditch effort that you take when you have corruption in the database and no good backup to recover from since /P will remove damaged pages as it finds them in the DB and can cause small to massive data loss.  Now hopefully you don't have a -1018 in this new database and if you do then you have a hardware based issue as I mentioned above and unfortunately doing a /P against a fresh database with a -1018 is probably just going to do more damage so I hope that you don't have a -1018.

5. Now if you don't have a -1018 and just did a /P because of a misunderstanding then it probably wont hurt anything, however you should do a /D to defrag thereafter

6. Neither /P nor /D will remove the logs, however I should point out that one of the other drawbacks to doing a /P is that existing logs will no longer be able to be replayed into the /P database however, assuming that you dismounted the database without issue prior to running the /P it will have committed all of the outstanding/uncommitted  logs

7. At this point you can delete the logs manually but just in case your public folder DB is using the same path I would A: dismount all the databases, mailbox and public folder.  B: move the log files to an alternate location or rename the log path if it only contains the logs and not the databases, i.e. if log path is C:\exchsrvr\logs you can rename to C:\exchsrvr\OLD_logs and then create a new C:\exchsrvr\logs directory and then C: start up the mailbox and public folder databases and you should be good to go.  Once they are all mounted then you can deleted the OLD-Logs path and then D: try a new backup
0
 

Author Comment

by:VitalNetworkSolutions
ID: 36554412
Hi Lucid,
After the backup failed (yes with a-1018). I panicked.
I dismounted the database then remounted.then I looked in ESM and the mailbox store had no mailboxes in it.
it was at this point i started a repair. It is at the 'deleting unicode fixup table' step.

have I knee jerked!!!!

Help Col

PS . i still have all the pst files and the backup of the edb and stm file before import and then repair.

 
0
Are your corporate email signatures appalling?

Is it scary how unprofessional your email signatures look? Do users create their own terrible designs and give themselves stupid job titles? You can make this a lot easier for yourself by choosing an email signature management solution from Exclaimer today.

 
LVL 17

Expert Comment

by:lucid8
ID: 36554447
ok so just to clarify;

1. you received a -1018 on the NEW database during the backup process?

2. if yes to # 1 and if you lost access to the mailboxes then you have some type of major hardware issue and I would suggest that you look at resolving the issue ASAP by following recommendation above of reversing any hardware changes that recently took place, or perhaps firmware updates on controllers etc else as stated above you will just keep going in circles.  Now as mentioned above if nothing has changed you are best to get a new server in place ASAP because trying to track down the cause is like trying to find the needle in the haystack, because is could be a motherboard, memory, disk controller, disk cabling system, hard drive, firmware on the controller or hard drive, config settings on the controllers, etc,   That said most of these issues I have experienced are due to an issue within the disk subsystem so you may want to look in your event logs to see if there are any telling events.

3. The good news is that you have the PST files and hopefully they are not sitting on the same server that has issues since the system is suspect, however if so copy them ASAP to a safe location.

4. Can you open the PST files and see the data within?

5. Also keep in mind that if the Outlook clients did not connect to the new server DB and they are configured with OST's you can do one more thing to cover the risk of data loss, i.e. once logged into a users desktop, disable the network connection or unplug the LAN cable and  open Outlook at each users desktop and it will open in offline mode and display all the information within the OST.  Export all the information to PST on the users desktop so that you have a secondary and local PST backup.

6. Once you bring the repaired or new server online you will have the ExMerge PST's to import and the desktop PST's as backup if a users claims anything is missing
0
 

Author Comment

by:VitalNetworkSolutions
ID: 36554493
Hi Lucid8,

Yes, the pst are on a USB external hard drive. Also i can open the pst in an outlook client.
I was thinking the same as you. If the users claim any data loss i will import the dat from the pst files.
I guess I did knee jerk, so I'll have to wait until the repair is complete.

Please can you have a look at the plan below and comfirm I am doing the right steps;

1. Wait until repair is complete. i guessing on a 50Gb databse it could take about 9 -10 hours
2. Run eseutil /d <datebase path and name>  (I don't know how to estimate the time for this. if alot of data lost very short time)
3. Run isinteg -s <servername> -fix -test alltests (after research I think about 9 hours)
4. Do I remount the database manually after the isinteg?
4. Backup the database
5. Speak to my client on Monday morning and tell thenm they need to purchase a new server so i can move the database to a different machine.

At which point can i remove the log files? can i remove them after the repair. this would help for the space required during the defrag.

Thanks very much for the help with this.
I just need to be patient now. then i can see what damage has been done.
Col
0
 
LVL 17

Accepted Solution

by:
lucid8 earned 500 total points
ID: 36554615
1. Do you know if any changes were recently done on this system in terms of hardware or firmware upgrades? If so see about rolling them back or talk to the vendor about possible issues, else you need to plan to get to a new server ASAP

2. Yes normally your plan would be fine IF you had the hardware issue that's causing the -1018s under control/repaired but you are in a precarious position since you have the -1018 on this brand new database and more then likely its because the backup stressed the dusk subsystem.

3. If you continue to run the /P and /D you are just going to put that database under more stress and it will very well damage things further and cause data loss, therefore I would put a new server in place ASAP, however, if that is not possible right now I would do the following;

A. Stop the /P

B. Copy that New/Damaged database to a safe location in case you need to extract data via RSG or a 3rd party utility. NOTE: You wont need the logs since one you start a /P the old logs are useless.

C. get rid of the logs and the current  new db that has the -1018

D.  Mount the database from within ESM and let it create a new .EDB

E. Stop your backups they are doing you no good, i.e. they cant finish and actually doing a backup puts the disk system in heavy i/o which will cause more -1018s so best to avoid stressing that system at all cost until you get the issue resolved or move to new hardware

F. Import the PST's again

G. Go to each desktop and export the offline cache OST to PST as described about so that you have a secondary backup of the user data.

H. Connect to the users mailboxes to validate that you can see the data

I. talk to the client ASAP and get a new system in place and migrate the data to the new server else they will experience data loss.
0
 

Author Comment

by:VitalNetworkSolutions
ID: 36554653
Hi Lucid,

I do not know of any changes to hardware/firmware.

However, sounds like a plan.I will stop the /p and follow your steps.
let you know how it goes. Obvoulsy I will award the points to you for all your help.
Thanks,
Col
0
 
LVL 17

Expert Comment

by:lucid8
ID: 36554680
OK, let me know how things go and I will answer ASAP, however about to take my daughter out for daddy/daughter day so may not answer as fast since I will only have my phone

The good news of course is that you have exported the Exchange data from the old EDB and also have the opportunity to make PST's from the users offline cache.   Only other issue is that this is an SBS server so the other data on these disk are at risk also, therefore you may want to consider making an offline copy of all of the data as a safety net

Once you know all the data is safe, there are a few other things you may want to check on as well however if something is on the edge it may be a good idea to let things be and just get a new system.  That said you could;

1. Check out the Disk Subsystem vendors site and see what the latest firmware is vs what you have and then contact them to see if they have any known issues.   Could be they do and if so then you can attempt to upgrade it, however there are risk involved if the upgrade goes bad

2. Turn the system off and check all connections, i.e. re-seat controller cards, cables, memory chips etc, but again if something goes wrong you could end up in a worse position so weight the risk before diving in eh?  

0
 

Author Closing Comment

by:VitalNetworkSolutions
ID: 36554696
Hi Lucid8,
I will update you how it ends.
I will be definately suggesting a new server to my client, this isn't a good way to spend my weekends.

Hope you and your daughter have a good day out.

Thanks for all you help.
Col
0
 
LVL 17

Expert Comment

by:lucid8
ID: 36554756
thanks, yeah not a good way to spend your weekend and hope all goes well!
0
 

Author Comment

by:VitalNetworkSolutions
ID: 36556244
Hi Lucid8,
Imported all of the messages to the database. Again, a long night, trying to grab sleep inbetween exmerge setups when I could.
Seemed to go quicker this time so I decided to run a Full backup across it using the backup exec.
It completed sucessfully..... Purged the log files and regained 80Gb of disk space.
Before the backup I tested a client machine to ensure they were linking to the database. I had to delete and recreate the Outlook profile as every time i tried to link to it the IMAP timed out.

I have a few of the below error for about 5 users in the event log. however i think these will be resolved when I recreate the link to their accounts. I'll test this shortly.

I realised  the 1018 error will more than likely come back to I intend to go ahead with the new server. i just need to work out the best way to do this now.

Thanks,
Col
__________________________________________________________________________
Event Type:      Error
Event Source:      Server ActiveSync
Event Category:      None
Event ID:      3005
Date:            18/09/2011
Time:            10:59:46
User:            HANIXEUROPE\sasao
Computer:      HANIX2003
Description:
Unexpected Exchange mailbox Server error: Server: [hanix2003.hanixeurope.com.local] User: [psmith@hanixeurope.com] HTTP status code: [409]. Verify that the Exchange mailbox Server is working correctly.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
__________________________________________________________________________
0
 
LVL 17

Expert Comment

by:lucid8
ID: 36556733
Agreed on all counts and thanks for the update
0

Featured Post

Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Utilizing an array to gracefully append to a list of EmailAddresses
Not sure what the best email signature size is? Are you worried about email signature image size? Follow this best practice guide.
In this video we show how to create a Contact in Exchange 2013. We show this process by using the Exchange Admin Center. Log into Exchange Admin Center.: First we need to log into the Exchange Admin Center. Navigate to the Recipients >> Contact ta…
The video tutorial explains the basics of the Exchange server Database Availability groups. The components of this video include: 1. Automatic Failover 2. Failover Clustering 3. Active Manager

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now