Solved

Replay of Transaction Logs stops and crashes MAD.EXE

Posted on 2006-07-15
24
1,098 Views
Last Modified: 2008-01-09
Good morning
We are in the process of moving from 5.5 to Exch 2003.  To make a long story short, the 71GB priv.edb IS became unusable due to a RAID 5 error.  Our backup is quite old, about 5 weeks, but we do have all the transaction logs between that backup and the crash.  There are about 2,900 of them as it's an active server.  I added new storage to the server to hold all these transaction logs for recovery as the location pre crash didn't have enough space.  I then made a new partion to hold the restored IS and mounted it as the same drive letter as the failed RAID array.  Restored the backup using Backup Exec 9.1.  The logfiles start to play but when it gets to a certain point about 75% of the way through, I get a Dr. Watson of MAD.EXE.  I created a recovery server per MS instructions for restoring a single mailbox and did the same thing, thinking that perhaps because the original server was old and underpowered it may not have had enough power to complete the whole recovery since there were so many log files.  The recovery process crashed MAD.EXE in the exact same spot.  

I restored to the original server again but this time moved out all logs just prior to the one that was replaying during the crash.  This time it crashed MAD.EXE but at a different logfile.  I did use the eseutil utility to look at the header of what I thought was the corrupt logfile (from the original restore) and it read it in fine.

I then restored to the recovery server again but this time left the log file directory blank, so that BackupExec would restore the full backup of the database to it's consistent state as of that backup and not play any other logfiles back.  This worked fine and the IS opened with no problem.  However, it's missing 5 weeks of mail of course.  

Has any expert run into issues like this before where a good backup cannot be brought current due to issues with the transaction logs?  All the logs are there as I ran MS's script in the DR white paper to check for the sequence.  Is it possible that there's some sort of corruption in the IS from the backup that is preventing full replay of all the logs?   Would having the transaction log directory be a different path from the backup have any effect (I adjusted the path in RegEdit before the restore)?  Is there any utility I can run on an unmounted IS PRIOR to beginning the replay of transaction logs that might assess (and fix?) corruption in the restored IS?

There is one error in the event viewer after the replay starts where it says something about an I/O size discrepancy in the IS.  This comes up during the transaction log replay but it continues on fine for quite a while after that up until the crash of MAD.EXE, which occurs after perhaps 2200 log files have been replayed (the I/O size error shows up after about the 20th log file has replayed).    

Tried to call Microsoft but support is discontinued for 5.5.   Ideally, I need to get the production 5.5 server running so I can move the final 20 or so mailboxes off it and onto the Exchange 2003 server, even if it means that those moved mailboxes are missing data.  Once those mailboxes are moved I then hope to be able to use Exmerge to get the missing data back and merged into the mailboxes on Exch 2003.  

Please help me if you can.  I will make it worth your while points wise and will send an Amazon GC to whoever can help me through this....This is an urgent issue.
0
Comment
Question by:emilysam
  • 12
  • 12
24 Comments
 
LVL 9

Expert Comment

by:Exchgen
ID: 17114627
I would try and help you as much as possibe....

Please provide me details requested below;

1. Do we have a dr. watson log and user.dmp file created? Please send me the dr watson log file.
2. Did you make sure that the OS, exchange SP and hotfixes in the new server is equal to the old crashed server?
3. Do you have all log files from day one when the database would created till date?
4. Looking for a 3rd party solution, check this out
    Ontrack® PowerControls™
    http://www.ontrack.com/powercontrols/

Try and give me more inputs so that i can provide you accurate tips.

Raghu
0
 

Author Comment

by:emilysam
ID: 17114661
1.  Not sure if I have a Watson log and/or .dmp.  I will look and let you know
2.  I am using 2 different servers.  One is the actual production server that is unchanged from the crash with the exception of the IS not being to mount from the failed RAID array.  That server is NT4 sp6a.  The 2nd server is a recovery server running win2k server.  It had been suggested that perhaps the 1st server, called EXCHMAIL, which is the production 5.5 server, was not powerful enough to run through a log replay of 2,900 log files.  So, we built a recovery server with Win2k that is substantially more powerful.  That server is called EXCH55RECO.  Curiously, that newer more powerful server runs MUCH more slowly while replaying transaction logs.  It may be due to the fact that the IS on that box is stored on our iSCSI SAN versus a direct attached Fibre Channel RAID array on the original production server called EXCHMAIL.  So, to make a long story short, I highly doubt that the OS version / level is causing this as we are restoring to the original unchanged production server in the first case.
3.  We do not have all log files from day one.  We've been running exchange for over 6 years now.  This particular IS was created about 3 1/2 years ago when we upgraded hardware.   We did have an issue with this IS back in Dec 2005 and restored from a cold/offline backup with logfile replay.  The database has been running fine ever since.
4.  I will take a look at those tools.  

I'd really like to have someone in the know comment on this situation and answer the specific questions I posed previously.   I've already done a lot of troubleshooting and I need to know the answers to the questions listed in order to determine my next step.  So, here they are again:

A.  Has any expert run into issues like this before where a good backup cannot be brought current due to issues with the transaction logs?  All the logs are there as I ran MS's script in the DR white paper to check for the sequence.  

B.  Is it possible that there's some sort of corruption in the IS from the backup that is preventing full replay of all the logs?   Would having the transaction log directory be a different path from the backup have any effect (I adjusted the path in RegEdit before the restore)?  

C.  Is there any utility I can run on an unmounted IS PRIOR to beginning the replay of transaction logs that might assess (and fix?) corruption in the restored IS

Thanks for your help so far.
0
 
LVL 9

Expert Comment

by:Exchgen
ID: 17114718
You mentioned in your first post that you did not replay the 2900 logs and IS came up fine... So its quite possible (per your theory) that IS from backup is bad.

If you still have the recovered IS (restored from backup log files not played) try running eseutil /g /v /x priv.edb >c:\integcheck.txt

Search through the txt files and look for errors...

If you find 'ORPHANED LONG VALUES" the database is fine, but if you find "ORPHANED CURROUPT VALUES" your theory of bad database is acceptable.

You also mentioned that this mostly happens post 2200 log files are played, try steps mentioned below;

1. Have all MDBDATA folders empty.
2. Restore from backup, do not select start services after backup.
3. Check under HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\MSExchangeDS or MSExchangeIS\Restore in Progress, check the high log and low log value.
4. Check for edb.chk file.
5. Increse logging on SA and IS (HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\MSExchangeDS or MSExchangeIS\diagnostics) (HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\MSExchangeDS or MSExchangeSA\diagnostics) set all parameters to "7".
6. Start IS.

If you have ensured that all the above is how you restored and you do not have any corroupt log files all should start fine.

Your theory of OS not making a difference is wrong, try eseutil /mh on database from NT and from 2000 or 2003, the database carries the OS signature with the level of service pack you have... I have worked on tons of DR problems and this is a common mistake a lot of professionals do.

Please provide more inputs i am willing to stretch to any extent to help.

Raghu
0
 

Author Comment

by:emilysam
ID: 17114767
Excellent.  Thanks.  I will try what you propose.  My first issue right now is to get my backup server running again.  The backup device just went "offline" for no apparent reason. I can't catch a break I guess.  Also, to clarify,

3. Check under HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\MSExchangeDS or MSExchangeIS\Restore
in Progress, check the high log and low log value.

You just want me to record these values, correct? Do I need to change any values here?

4. Check for edb.chk file.

Do you want me to remove this or just check that it is there?

Also, with respect to the O/S issue, I guess I was saying that in this case I didn't think that O/S differences was the cause because the log replay crashed in exactly the same place and in the same manner on both the NT (orig production) and Win2k (recovery server) systems.  Nonetheless, you sound like you know what you're talking about so I will follow your advice.  

Also, I do have a Dr Watson log of the MAD.exe crash during log replay from last night.  Do you want me to email it to you?


0
 
LVL 9

Expert Comment

by:Exchgen
ID: 17114824
Nice...

Check the high and low log value and ensure that all log files are present within that range...

Its possible to fool the IS by modifying the key or just deleting it.... if you delete it and if you feel say up to 2100 log files will not error out on you.... just place them only... Also ensure that the we rename the last file in the sequence (say 2100 to edb.log).

If you intend to remove the restore and progress key delete the EDB.CHK file so that IS does not look for it while recovering..

Am not too sure about providing email address in the forum.... may be you can upload it somewhere?

Raghu
0
 

Author Comment

by:emilysam
ID: 17114890
Ok. No prob on email, I understand. I will place it on a website for you.  You can get it at:
http://www.microbia.com/drwatson.txt  

Remember, this dump file represents the crash that occured when replaying the logfiles on the original production server called EXCHMAIL running NT4.  For that restore, I did NOT include complete log file set.  I had removed everything from 11DC3.log onward as that was the log that was replayed just prior to the replay crash on multiple tries.

I am running the eseutil job you described against the IS that was restored successfully from the backup with no log files replayed with the execption of the ones that were restored by Backup Exec (10 or so).  I have already opened that IS and extracted some .pst data with ExMerge.  That is being done on a 3rd server that was configured for recovery.  It is running NT4 Server with same sp as the production server.  That 3rd server is called EXCHANGERECOVER for future reference.

On the 2nd server described above, running Win2k, and called EXCH55RECO, I restored the IS and included all the log files up to the one that had been crashing (11DC3.log), just as I did with the production server.  As you'll recall, when I did this with the production server called EXCHMAIL, it resulted in the MAD.EXE error (dump file of that is on the website above).  The log replay on EXCH55RECO has been running all night and is still running now.  So far, it looks like it has gotten past the point where MAD.exe crashed on the produciton server called EXCHMAIL:

EXCHMAIL Event Viewer:

First this at 10:47:51 pm:   (312) The database engine is replaying log file L:\exchsrvr\MDBDATA\edb1186A.log
Then Dr Watson of MAD.EXE...
and then at 10:48:21, Event ID 1081:  Unable to recover the database because error 0x000006be occurred after a restore operation.
and then at 10:48:21pm Event ID 5000: Unable to initialize the Microsoft Exchange Information Store service. Error 0x6be.

Now on EXCH55RECO (Win2k, currently still running):
7/15/06, 12:54pm   (1032) The database engine is replaying log file E:\exchsrvr\MDBDATA\edb11D56.log.

So it seems that the EXCH55RECO has gotten past the logfile where EXCHMAIL crashed last night.  In both cases, I had removed the logfiles from 11DC3.log onward as that was where it had crashed multiple times previously.   Since the highest log right now on EXCH55RECO is 11DC2.log and it just finished 11D56.log, it will hopefully finish soon and we'll have more data and info to go on.

Thanks again for your prompt assistance.  I'm really appreciative.


0
 
LVL 9

Expert Comment

by:Exchgen
ID: 17115010
Great going!!!

Look for an event ESE97 with ID 109...

Check for the log file mentioned in the event...

Remove log files from that point and try starting the IS if the restore fails this time...

Do not forget to delete restore and progress key and the chk file...

Raghu
0
 
LVL 9

Expert Comment

by:Exchgen
ID: 17115022
Sorry forgot to mention this kb;

XADM: Exchange Information Store Does Not Start After Online Restore
http://support.microsoft.com/kb/322314/en-us

Raghu
0
 

Author Comment

by:emilysam
ID: 17115172
Ok, so the log replay completed successfully on EXCH55RECO, which is the Win2k Box.  However, there are some new error messages now:

MSExchangeIS (1032) The database engine is initiating index cleanup of database 'F:\exchsrvr\MDBDATA\PRIV.EDB' as a result of an NT version upgrade from 4.0.1381 SP6 to 5.0.2195 SP3.

MSExchangeIS (1032) Database 'F:\exchsrvr\MDBDATA\PRIV.EDB': The secondary index '*T668f+Q6749+S3001+Q6748 409' of table 'Folders' is corrupt. Please defragment the database to rebuild the index.

Unable to recover the database because error 0xfffffa7a occurred after a restore operation.

Unable to initialize the Microsoft Exchange Information Store service. Error 0xfffffa7a.

I presume this is somewhat expected due to the change in operating system?    Shall I go ahead and defrag as suggested?  I have plenty of disk so that should not be a problem.  I'm going to start copying the priv.edb so that I can rollback if the defrag messes things up.  BTW: The restore is in progress on the original production server EXCHMAIL.
0
 
LVL 9

Expert Comment

by:Exchgen
ID: 17115233
Great!!!

defrag should help you!!! in case you run in to rad blocks also run an ISINTEG on the database to negate the probability of any logical currouption...

Do check the state of the database, it has to should "consistant" as this is expected for the defrag to run successfully!!! run eseutil /mh on the database to dump the header of the database to get the "STATE"

I hope you have a peaceful sunday atleast!!! :)

Raghu

0
 
LVL 9

Expert Comment

by:Exchgen
ID: 17115245
check KB224977..

It gives us most of the errors that you are getting!!!

It has an elobrate resolution section... i feel this will do the job...

Raghu
0
 

Author Comment

by:emilysam
ID: 17115284
Thanks for that article.  That one is definitely spot on.  I will follow those instructions and post more info later on other open issues:

a.  potential corruption in the IS from the backup.  The eseutil is running on that now on a second recovery server
b.  possibilities for completing replay of transactions to the point of failure.  Right now, I'm up to July 5 on the EXCH55RECO server which should be mountable once we defrag / isinteg -patch per the KB article above.  Ideally, I'd like to recover up to July 10th when the IS crashed.
c.  Status of recovery without replay of the production server.  The restore job is running now with all MDB directories cleared.  

Thanks for your help on this.  Please stick with me on this one.  I will reward you accordingly.

Thanks
0
Don't lose your head updating email signatures!

Do your end users still have the wrong email signature? Do email signature updates bore you or fill you with a sense of dread? You can make this a whole lot easier on yourself by trusting an Exclaimer email signature management solution. Over 50 million users do...so should you!

 

Author Comment

by:emilysam
ID: 17115397
Ok, I ran through the KB article and got the IS running successfully on the EXCH55RECO Win2k Server.  This IS is about 5 days prior to the system crash.  We've got 4 of the 5 weeks of mail back!!!!  Once the restore of the production server completes I'll let you know if that mounts.  If it does, I'll move the mailboxes that were left on that box over to their new home on the Exchange 2003 server.  Then I'll work on getting that remaining 5 days of missing email back.  

0
 
LVL 9

Expert Comment

by:Exchgen
ID: 17115492
Great going!!!

I am happy we have made good progress!!

Keep your taughts focussed and all will be fine!!!

Raghu
0
 

Author Comment

by:emilysam
ID: 17125386
Just posting an update on progress here.  I attempted to restore the IS to it's precrash state one more time.  This time, I was very careful about making sure that all transaction logs were available from the time of the backup until the time of the crash.  I also updated the Restore In Progress key so that the highest log # was set to the last hex log I had pre-crash.  The very last hex log prior to crash had been renamed to edb.log so the one just prior to that was the one I entered in as the High Log number.  I then restored from backup and attempted to start the IS.  The log replay started up fine but once again crashed at the exact same place and generated a Dr. Watson of MAD.exe.  So, the original problem still exists.  I ran the same process once again on a recovery server running Windows 2000 server but that time removed the logs from just prior to the MAD.exe crash up until the end.  This time log replay finished and the IS opened up.  So, it seems that there's either a transaction in the affected log which is killing the IS or can't be processed for replay, or the log file itself is corrupted. The last logfile that was replaying when MAD.exe crashe was edb11DC4.log.  I have included the header of that and 11DC5.log below.  They appear to be fine and uncorrupted.  Each is 5120K as expected.  So, the real remaining question is this:  If a transaction log goes bad for whatever reason, what are the steps for recovery?  Is there any way to remove a logfile out of sequence and continue replaying for recovery?  

Logfile replaying when MAD.exe crashed edb11DC4.log
E:\exchsrvr\MDBDATA.bad.failed.0710.recovery>eseutil /ml edb11DC4.log

Microsoft(R) Windows NT(TM) Server Database Utilities
Version 5.5
Copyright (C) Microsoft Corporation 1991-1999.  All Rights Reserved.

Initiating FILE DUMP mode...
      Log file: edb11DC4.log

      lGeneration (73156)
      Checkpoint NOT AVAILABLE
      creation time:7/5/2006 14:53:19
      prev gen time:7/5/2006 14:39:57
      Format LGVersion (6.1503.0)
      Engine LGVersion (6.1503.0)
      Signature: Create time:12/21/2001 10:3:2 Rand:24842 Computer:
      Env SystemPath:C:\exchsrvr\MDBDATA\
      Env LogFilePath:E:\exchsrvr\MDBDATA\
      Env Log Sec size:512
      Env db page size:0
      Env (Session, Opentbl, VerPage, Cursors, LogBufs, LogFile, Buffers)
          (    182,   27300,    4860,    9100,      84,   10240,  208896)
      1 F:\exchsrvr\MDBDATA\PRIV.EDB
         dbtime: 2092449716 (0,0)
          objid: 1757511
      Signature: Create time:12/21/2001 10:3:9 Rand:67798 Computer:
DatabaseSizeMax: 0
      Last Attach (72875,45,308)  Last Consistent (72875,44,164)
      2 F:\exchsrvr\MDBDATA\PUB.EDB
         dbtime: 30369234 (0,0)
          objid: 21504
      Signature: Create time:12/21/2001 10:3:11 Rand:58314 Computer:
DatabaseSizeMax: 0
      Last Attach (72875,45,412)  Last Consistent (72875,44,202)

Operation completed successfully in 0.31 seconds.

Next logfile in sequence, perhaps the one that cause MAD.exe to crash when it opened, edb11DC5.log
E:\exchsrvr\MDBDATA.bad.failed.0710.recovery>eseutil /ml edb11DC5.log

Microsoft(R) Windows NT(TM) Server Database Utilities
Version 5.5
Copyright (C) Microsoft Corporation 1991-1999.  All Rights Reserved.

Initiating FILE DUMP mode...
      Log file: edb11DC5.log

      lGeneration (73157)
      Checkpoint NOT AVAILABLE
      creation time:7/5/2006 15:4:50
      prev gen time:7/5/2006 14:53:19
      Format LGVersion (6.1503.0)
      Engine LGVersion (6.1503.0)
      Signature: Create time:12/21/2001 10:3:2 Rand:24842 Computer:
      Env SystemPath:C:\exchsrvr\MDBDATA\
      Env LogFilePath:E:\exchsrvr\MDBDATA\
      Env Log Sec size:512
      Env db page size:0
      Env (Session, Opentbl, VerPage, Cursors, LogBufs, LogFile, Buffers)
          (    182,   27300,    4860,    9100,      84,   10240,  208896)
      1 F:\exchsrvr\MDBDATA\PRIV.EDB
         dbtime: 2092474496 (0,0)
          objid: 1757548
      Signature: Create time:12/21/2001 10:3:9 Rand:67798 Computer:
DatabaseSizeMax: 0
      Last Attach (72875,45,308)  Last Consistent (72875,44,164)
      2 F:\exchsrvr\MDBDATA\PUB.EDB
         dbtime: 30369667 (0,0)
          objid: 21504
      Signature: Create time:12/21/2001 10:3:11 Rand:58314 Computer:
DatabaseSizeMax: 0
      Last Attach (72875,45,412)  Last Consistent (72875,44,202)

Operation completed successfully in 0.15 seconds.

0
 
LVL 9

Expert Comment

by:Exchgen
ID: 17125599
The point is eseutil /ml only dumps the header of the log file, it does not show if there is any corrouption in the log fie...


I was hoping that you would ask this question s here you go;

XADM: How to Detect Header Damage in Databases, Log Files, Patch Files, and Checkpoint Files
http://support.microsoft.com/?kbid=253325

Raghu
0
 

Author Comment

by:emilysam
ID: 17126051
Ok.  I ran the utility and here are the results.  From the KB253325 article, it seems as if the logfile are good as page 0 is listed first?  I'm concerned about the logfiles showing the order of pages in correct.  They are all over the map.  Does this indicate anything to you?  Am I using the correct switch (/s) with esefile?  The KB articles says that for 5.5 logfiles /s should be used so that's what I'm using...

Here's log edb11DC4.log
E:\exchsrvr\MDBDATA.bad.failed.0710.recovery>esefile /s edb11DC4.log

Microsoft(R) Exchange Server(TM) Database Utilities
Copyright (C) Microsoft Corporation 1999.  All Rights Reserved.


Checksumming

          0    10   20   30   40   50   60   70   80   90  100
          |----|----|----|----|----|----|----|----|----|----|
          ERROR: page 0 checksum failed ( 0xcc566a0 / 0x19203 )
ERROR: page 16 returned page -1
ERROR: page 1 returned page 639000833
ERROR: page 31 returned page 2049
ERROR: page 16 checksum failed ( 0x32ce2ba8 / 0xffffffff )
ERROR: page 1 checksum failed ( 0xb57ab39d / 0x219f492 )
ERROR: page 47 returned page 75005040
ERROR: page 31 checksum failed ( 0xb2f34a95 / 0x59000000 )
ERROR: page 17 returned page -43614799
ERROR: page 5 returned page -285546513
ERROR: page 47 checksum failed ( 0xcf7c5b2d / 0x802 )
ERROR: page 64 returned page 0
ERROR: page 32 returned page 1128005760
ERROR: page 17 checksum failed ( 0xa6849e21 / 0x91a9e491 )
ERROR: page 80 returned page 533502976
ERROR: page 5 checksum failed ( 0xb7a0d80d / 0xfbe8effb )
ERROR: page 48 returned page 169607168
ERROR: page 64 checksum failed ( 0xdea3482b / 0x0 )
ERROR: page 95 returned page 0

Here is logfile edb11DC5.log:
E:\exchsrvr\MDBDATA.bad.failed.0710.recovery>esefile /s edb11DC5.log

Microsoft(R) Exchange Server(TM) Database Utilities
Copyright (C) Microsoft Corporation 1999.  All Rights Reserved.


Checksumming

          0    10   20   30   40   50   60   70   80   90  100
          |----|----|----|----|----|----|----|----|----|----|
          ERROR: page 0 checksum failed ( 0xa2f7c3c / 0x1f503 )
ERROR: page 15 returned page 1818323298
ERROR: page 31 returned page -1863323645
ERROR: page 47 returned page 436684474
ERROR: page 63 returned page 11861760
ERROR: page 1 returned page 643952897
ERROR: page 15 checksum failed ( 0xe868393a / 0x6f726369 )
ERROR: page 31 checksum failed ( 0x7f2df12d / 0x94acc7cd )
ERROR: page 47 checksum failed ( 0xf5a2f7ea / 0x0 )
ERROR: page 63 checksum failed ( 0xab0c56fe / 0x25300c )
ERROR: page 1 checksum failed ( 0x5da0cce1 / 0x21a0622 )
ERROR: page 16 returned page 609095680
ERROR: page 32 returned page 1350127883

And here is logfile edb11DC3.log, which seems to be the last logfile that was replayed successfully prior to the crash:
E:\exchsrvr\MDBDATA.bad.failed.0710.recovery>esefile /s edb11DC3.log

Microsoft(R) Exchange Server(TM) Database Utilities
Copyright (C) Microsoft Corporation 1999.  All Rights Reserved.


Checksumming

          0    10   20   30   40   50   60   70   80   90  100
          |----|----|----|----|----|----|----|----|----|----|
          ERROR: page 0 checksum failed ( 0xb686d479 / 0x13703 )
ERROR: page 15 returned page 1571243893
ERROR: page 31 returned page 7405826
ERROR: page 47 returned page 210699
ERROR: page 63 returned page 546335337
ERROR: page 1 returned page 58719232
ERROR: page 15 checksum failed ( 0xdeffc041 / 0xcc0530c5 )
ERROR: page 31 checksum failed ( 0x4242d217 / 0x7084e4 )
ERROR: page 47 checksum failed ( 0x21c2e4f1 / 0x30e21 )
ERROR: page 63 checksum failed ( 0x7cb184c4 / 0x6a745004 )

0
 
LVL 9

Expert Comment

by:Exchgen
ID: 17126187
Fom the headers it looks like the log files are fine... Possibly your earlier theory of a transation in the log file may be causing the dr watson.

What are the files versions of store.exe and mad.exe on the production and recovery server?

Raghu
0
 

Author Comment

by:emilysam
ID: 17126207
Recovery Server
Store.exe:  5.5.2653.23
MAD.exe:  5.5.2653.23

Production
Store.exe  5.5.2653.23
MAD.exe   5.5.2653.23

Same on both.
0
 

Author Comment

by:emilysam
ID: 17126229
Also, is it ever possible to bypass a transaction log?  If indeed some transaction in that log is affecting the replay capability, is there some way of bypassing it?   I've heard it suggested that I could rename the subequent log files to basically remove the offending one.  Sounds dangerous to me but I'm very quickly running out of options it seems.

I.e. if 11DC4 is offending then
11DC5 ---> 11DC4
11DC6 ---> 11DC5
11DC7 ---> 11DC6
etc...



0
 
LVL 9

Accepted Solution

by:
Exchgen earned 500 total points
ID: 17126449
As of what i know its almost impossible to bypass log files...

Why we cannot bypass log files is because the log files carry a sequence and so does the database files.

If you run eseutil /mh on the database file it tell you what is the next required log file and exchange will not bypass 1 or many in between where in the sequence is broken..

Its a chain of log files, you cannot expect to break the chain in between and continue it further...

What we did to recover is end the chain on a latter part...

There is one small question i wanted to ask you, are you restoring the dir.edb along with the priv and pub?

Did we attempt to install fresh copy of dir.edb and then try and replay log files?

Raghu

0
 

Author Comment

by:emilysam
ID: 17127026
On production, I did not restore the directory.  This is in an environment where we're moving to Exchange 2003 with AD so we're using the Active Directory connector and the Exchange Directory has been extended for AD attributes so I didn't want to complicate things further by restoring the directory on the production box.  However, on the recovery server, it's a brand new blank directory that was created during the install of Exchange 5.5   I did not restore the directory on that box.  

It is also my understanding that it's impossible to break the sequence of log files.  Perhaps this situation is simply a flaw in the design of the database.  The transaction log has recorded a transaction which is causing problems in the database itself yet you cannot undo the problem.  I think my only hope for getting all data back at this point is to look for data recovery on the failed RAID array and try and get the IS from that array running again.   This situation highlights the fact that even with all the transaction logs available you can still end up with data loss.  Therefore, it's important to take frequent full backups and not necessarily rely on the ability to replay transactions back from logfiles.

At this point, I guess I'm done as I don't see anything else that can be done with Exchange unless there's some way of further troubleshooting these logs and/or the IS itself.  Is there any other utility that can be run on the IS while it's not mounted but after it's restored from backup and before it's had recovery performed?  Any other tools that can read / breakdown a log file to see perhaps what is causing the error?  Any thoughts on debugging the Dr. Watson file that was created.  I'd really like know what exactly is causing the crash during log replay even if there's no way to recover from it.

0
 
LVL 9

Expert Comment

by:Exchgen
ID: 17127222
I guess you can call Microsoft Pss and ask them to analyze the dump and provide an insight...

I guess you know that Microsoft does not support exchange 5.5 anymore but i guess they will not take a step back on analyzing a user dump and providing you some information on the same.

I will try and help you if i get any more inputs!!! for now i feel i am done!!

:(

Raghu
0
 

Author Comment

by:emilysam
ID: 17129338
Thanks for all your Raghu.  You went above and beyond in helping me understand the issue and make every attempt to resolve it.  
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Resolve Outlook connectivity issues after moving mailbox to new Exchange 2016 server
"Migrate" an SMTP relay receive connector to a new server using info from an old server.
In this video we show how to create a Shared Mailbox in Exchange 2013. We show this process by using the Exchange Admin Center. Log into Exchange Admin Center.: First we need to log into the Exchange Admin Center. Navigate to the Recipients >> Sha…
In this video we show how to create an Accepted Domain in Exchange 2013. We show this process by using the Exchange Admin Center. Log into Exchange Admin Center.: First we need to log into the Exchange Admin Center. Navigate to the Mail Flow >> Ac…

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now