Exchange 2007 Delete, Search and Background Cleanup

My Exchange 2007 server is currently acting crazy. For the last 3 years it has been working properly, but 2 weeks ago, some users were unable to search contacts or email on Outlook 2007/2010


These are the error messages that are popping up in the event log


Event Type:      Warning
Event Source:      MSExchangeIS Mailbox Store
Event Category:      Background Cleanup
Event ID:      9828
Date:            3/30/2011
Time:            2:28:21 PM
User:            N/A
Computer:      MAIL
Description:
Background cleanup of folders for database 'First Storage Group\Mailbox Database' was pre-empted because the database engine's version store was growing too large. Before the task was pre-empted, 1 folders were inspected and 0 of those were successfully deleted.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.



Event Type:      Error
Event Source:      MSExchangeIS Mailbox Store
Event Category:      Logons
Event ID:      1022
Date:            3/30/2011
Time:            12:55:49 PM
User:            N/A
Computer:      MAIL
Description:
Logon Failure on database "First Storage Group\Mailbox Database" - Windows account x\dmui; mailbox /o=x/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Recipients/cn=johnny.
Error: -1069
Client Machine: MAIL
Client Process: w3wp.exe
Client ProcessId: 0
Client ApplicationId: Client=WebServices;UserAgent=Mac OS X/10.6.7 (10J869); ExchangeWebServices/1.3 (61); Mail/4.5 (1084)

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Data:
0000: f5 87 97 61 0f 50 44 40   õ¿¿a.PD@
0008: bc d0 79 86 c7 6a 20 20   ¼Ðy¿Çj  
0010: 5b 44 49 41 47 5f 43 54   [DIAG_CT
0018: 58 5d 00 00 6e 00 00 00   X]..n...
0020: ff 10 1b 00 00 00 00 00   ÿ.......
0028: 00 02 60 00 00 00 3a 67   ..`...:g
0030: f0 1f fe 00 00 00 71 5d   ð.þ...q]
0038: 40 10 ec 03 00 00 d2 55   @.ì...ÒU
0040: 60 20 0f 01 04 80 40 00   ` ...¿@.
0048: 0c 68 d3 fb ff ff 71 5d   .hÓûÿÿq]
0050: 40 10 ec 03 00 00 d2 55   @.ì...ÒU
0058: 60 20 0f 01 04 80 00 80   ` ...¿.¿
0060: 6f 67 ec 03 00 00 f1 5e   ogì...ñ^
0068: 40 10 0f 01 04 80 b4 33   @....¿´3
0070: 40 10 d3 fb ff ff 07 08   @.Óûÿÿ..
0078: 40 10 d3 fb ff ff 07 0a   @.Óûÿÿ..
0080: 40 10 d3 fb ff ff 97 08   @.Óûÿÿ¿.
0088: 40 10 d3 fb ff ff         @.Óûÿÿ  


Event Type:      Warning
Event Source:      MSExchange Search Indexer
Event Category:      General
Event ID:      107
Date:            3/30/2011
Time:            12:55:44 PM
User:            N/A
Computer:      MAIL
Description:
Exchange Search Indexer has temporarily disabled indexing of the Mailbox Database First Storage Group\Mailbox Database (GUID = 95844283-e099-4487-9cce-af855ac84d85) due to an error (Microsoft.Mapi.MapiExceptionJetErrorVersionStoreOutOfMemory: MapiExceptionJetErrorVersionStoreOutOfMemory: Unable to set CI watermark (hr=0x80004005, ec=-1069)
Diagnostic context:
    Lid: 1494    ---- Remote Context Beg ----
    Lid: 13236   StoreEc: 0xFFFFFBD3
    Lid: 3840    StoreEc: 0xFFFFFBD3
    Lid: 27545   StoreEc: 0xFFFFFBD3
    Lid: 14638   StoreEc: 0xFFFFFBD3
    Lid: 1750    ---- Remote Context End ----
    Lid: 12018   StoreEc: 0xFFFFFBD3
    Lid: 9266    StoreEc: 0xFFFFFBD3
   at Microsoft.Mapi.MapiExceptionHelper.ThrowIfError(String message, Int32 hresult, Int32 ec, DiagnosticContext diagCtx)
   at Microsoft.Mapi.ExRpcAdmin.CiSetWaterMark(Guid mdbGuid, Guid instanceGuid, Boolean isHighWatermark, UInt64 watermark)
   at Microsoft.Exchange.Search.NotificationQueue.AddNotificationsForProcessing(MapiEvent[] notifications)
   at Microsoft.Exchange.Search.NotificationWatcher.NotificationWatcherThread()).

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.


Event Type:      Warning
Event Source:      MSExchangeIS
Event Category:      Database Storage Engine
Event ID:      9786
Date:            3/30/2011
Time:            2:45:22 PM
User:            N/A
Computer:      MAIL
Description:
The database engine has consumed 94% of the "version store buckets" resource (15464 used out of a maximum of 16386) for storage group 'First Storage Group'.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Data:
0000: 56 45 52 42               VERB    
indsupportAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

endital1097Commented:
The best course of action you can take is to create a new database and move all mailboxes to the new database
This action plan will minimize downtime and eliminate these errors the quickest
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
indsupportAuthor Commented:
The right method is to see the DB size. Mine in this case is very large (200GB)

Schedule an offline defrag and repair on the DB and then migrate the Mailboxes to 2 separate DB's so you can schedule maintenance without taking the entire company down
0
nashiookaCommented:
Moving the users will work, but I can add some information as to why.  I had this problem recently, and really struggled to understand what could be causing the version store to run out of memory.  Note: for readability below I'm chopping out the unimportant details of the various log messages and errors.

- It was on a single role Mailbox Server so back pressure wasn't being triggered.
- There were some very large users on the database, but it was otherwise lightly loaded, with only 20 real users and about 15 service accounts, with plenty of free disk space.
- Nightly maintenance was completing without issue
- A few users were reporting getting "cannot open default folders" messages in outlook, but after a few minutes they were able to get back in.
- During these periods some NDR's were reported to mailboxes in the same DB.
- Also during these periods the application log on the server was flooded with 1022 logon errors from a variety of users and computers, persumably Outlook, OWA, and EAS connections etc...

The sheer number of 1022 errors (and a few unrelated errors) made it difficult to see some of the key indicators:

Event ID: 9786:
The database engine has consumed 80% of the "version store buckets" resource (13116 used out of a maximum of 16386) for storage group 'SG1'.

Event ID: 107 (Warning):
Exchange Search Indexer has temporarily disabled indexing of the Mailbox Database SG1\MDB1 (GUID = fkb5a2d0-53b3-4d41-8128-a0a8003695d9) due to an error (Microsoft.Mapi.MapiExceptionJetErrorVersionStoreOutOfMemory: MapiExceptionJetErrorVersionStoreOutOfMemory: Unable to set CI watermark

Event ID: 1025: With verious messages related to the indexer.

Occasionaly when the support team tried directing users to OWA they would receive a similar message like:
Exception type: Microsoft.Mapi.MapiExceptionJetErrorVersionStoreOutOfMemory

After a little while the 1022 errors would subside, users were able to get in and the problem seemed to disappear.  I contacted the first user who had reported the issue and remote controlled her PC.  She was able get into outlook with no problem, but said it almost always freezes when she tries to search.  We told her not to search for the rest of the day and let's and we'll rebuild the index that night to see if it helps.

The day after the index rebuild the problem occured again, but we spotted soemthing interesting.  The first reports of the issue came in a few minutes after we had emailed that same user asking her to try searching!!  It made sense because we hadn't seen any 1022 errors since we had told her to not search the day before.  My suspicion was that it was this specific user's search that was provoking the issue.  I set perfmon to monitor the "version buckets allocated" counter and asked the user to search again.  The counter started climbing rapidly until it hit the 80% threshold where the counter itself hung.  A few minutes later  users started calling support again and the log finally started showing some of the messages.

That night, I moved that specific user off the database and the next day try as I might I could not reproduce the issue, even on her new server/DB.  The problem was resolved!

A few additional observations:
- In the period after the counter hit 80% a variety of progressively worse front end symptoms were reported, until ultimately users couldn't get in at all with the "unabled to open default folders" message.
- Some users claimed they hadn't seen any issues during this period.
- Not all the errors described above were logged each time we were able to reproduce the problem.
- We weren't able to get 100% consistency reproducing the issue.  In fact at 1 point we thought searching in a specific folder caused the issue.  combined with the logging issue we weren't completely certain we had found the trouble spot, but we had enough that the MB move was a worth while exercise.
- The mailbox move didn't skip any corrupted items.  I still believe something was corrupted in there, but I can't support the theory with the move logs.

If my expirience suggests anything it's that you may not have to move all the users.  Moves are not always that easy especially when you have very big mailboxes users in online mode etc...  As an alternative, you can look for the provactive action starting with patient zero, meaning the first user mentioned in the 1022 errors, or perhaps the first user who contacted the helpdesk.  Maybe work your way backwards, until & if you're able to isolate the trouble spot.

Hope that helps.
0
paulhart780Commented:
Reboot the Mailbox Server, it should correct the problem with the version store growing too large.
0
nashiookaCommented:
That was my initial instinct, but my CIO wouldn't allow it on a whim.  Rightfully he wanted a better understanding of the root cause.  Ultimately, I was able to resolved the problem without a reboot, and still haven't rebooted the server.  Given the scenario I had, and purely in retrospect, there's no reason to believe a reboot would've resolved it.  BTW I admit if I had rebooted  and if that fixed it I'd be content and never had done all that work.  On the flip side, I wouldn't have learned anything either.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Exchange

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.