indsupport
asked on
Exchange 2007 Delete, Search and Background Cleanup
My Exchange 2007 server is currently acting crazy. For the last 3 years it has been working properly, but 2 weeks ago, some users were unable to search contacts or email on Outlook 2007/2010
These are the error messages that are popping up in the event log
Event Type: Warning
Event Source: MSExchangeIS Mailbox Store
Event Category: Background Cleanup
Event ID: 9828
Date: 3/30/2011
Time: 2:28:21 PM
User: N/A
Computer: MAIL
Description:
Background cleanup of folders for database 'First Storage Group\Mailbox Database' was pre-empted because the database engine's version store was growing too large. Before the task was pre-empted, 1 folders were inspected and 0 of those were successfully deleted.
For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Event Type: Error
Event Source: MSExchangeIS Mailbox Store
Event Category: Logons
Event ID: 1022
Date: 3/30/2011
Time: 12:55:49 PM
User: N/A
Computer: MAIL
Description:
Logon Failure on database "First Storage Group\Mailbox Database" - Windows account x\dmui; mailbox /o=x/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Recip ients/cn=j ohnny.
Error: -1069
Client Machine: MAIL
Client Process: w3wp.exe
Client ProcessId: 0
Client ApplicationId: Client=WebServices;UserAge nt=Mac OS X/10.6.7 (10J869); ExchangeWebServices/1.3 (61); Mail/4.5 (1084)
For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Data:
0000: f5 87 97 61 0f 50 44 40 õ¿¿a.PD@
0008: bc d0 79 86 c7 6a 20 20 ¼Ðy¿Çj
0010: 5b 44 49 41 47 5f 43 54 [DIAG_CT
0018: 58 5d 00 00 6e 00 00 00 X]..n...
0020: ff 10 1b 00 00 00 00 00 ÿ.......
0028: 00 02 60 00 00 00 3a 67 ..`...:g
0030: f0 1f fe 00 00 00 71 5d ð.þ...q]
0038: 40 10 ec 03 00 00 d2 55 @.ì...ÒU
0040: 60 20 0f 01 04 80 40 00 ` ...¿@.
0048: 0c 68 d3 fb ff ff 71 5d .hÓûÿÿq]
0050: 40 10 ec 03 00 00 d2 55 @.ì...ÒU
0058: 60 20 0f 01 04 80 00 80 ` ...¿.¿
0060: 6f 67 ec 03 00 00 f1 5e ogì...ñ^
0068: 40 10 0f 01 04 80 b4 33 @....¿´3
0070: 40 10 d3 fb ff ff 07 08 @.Óûÿÿ..
0078: 40 10 d3 fb ff ff 07 0a @.Óûÿÿ..
0080: 40 10 d3 fb ff ff 97 08 @.Óûÿÿ¿.
0088: 40 10 d3 fb ff ff @.Óûÿÿ
Event Type: Warning
Event Source: MSExchange Search Indexer
Event Category: General
Event ID: 107
Date: 3/30/2011
Time: 12:55:44 PM
User: N/A
Computer: MAIL
Description:
Exchange Search Indexer has temporarily disabled indexing of the Mailbox Database First Storage Group\Mailbox Database (GUID = 95844283-e099-4487-9cce-af 855ac84d85 ) due to an error (Microsoft.Mapi.MapiExcept ionJetErro rVersionSt oreOutOfMe mory: MapiExceptionJetErrorVersi onStoreOut OfMemory: Unable to set CI watermark (hr=0x80004005, ec=-1069)
Diagnostic context:
Lid: 1494 ---- Remote Context Beg ----
Lid: 13236 StoreEc: 0xFFFFFBD3
Lid: 3840 StoreEc: 0xFFFFFBD3
Lid: 27545 StoreEc: 0xFFFFFBD3
Lid: 14638 StoreEc: 0xFFFFFBD3
Lid: 1750 ---- Remote Context End ----
Lid: 12018 StoreEc: 0xFFFFFBD3
Lid: 9266 StoreEc: 0xFFFFFBD3
at Microsoft.Mapi.MapiExcepti onHelper.T hrowIfErro r(String message, Int32 hresult, Int32 ec, DiagnosticContext diagCtx)
at Microsoft.Mapi.ExRpcAdmin. CiSetWater Mark(Guid mdbGuid, Guid instanceGuid, Boolean isHighWatermark, UInt64 watermark)
at Microsoft.Exchange.Search. Notificati onQueue.Ad dNotificat ionsForPro cessing(Ma piEvent[] notifications)
at Microsoft.Exchange.Search. Notificati onWatcher. Notificati onWatcherT hread()).
For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Event Type: Warning
Event Source: MSExchangeIS
Event Category: Database Storage Engine
Event ID: 9786
Date: 3/30/2011
Time: 2:45:22 PM
User: N/A
Computer: MAIL
Description:
The database engine has consumed 94% of the "version store buckets" resource (15464 used out of a maximum of 16386) for storage group 'First Storage Group'.
For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Data:
0000: 56 45 52 42 VERB
These are the error messages that are popping up in the event log
Event Type: Warning
Event Source: MSExchangeIS Mailbox Store
Event Category: Background Cleanup
Event ID: 9828
Date: 3/30/2011
Time: 2:28:21 PM
User: N/A
Computer: MAIL
Description:
Background cleanup of folders for database 'First Storage Group\Mailbox Database' was pre-empted because the database engine's version store was growing too large. Before the task was pre-empted, 1 folders were inspected and 0 of those were successfully deleted.
For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Event Type: Error
Event Source: MSExchangeIS Mailbox Store
Event Category: Logons
Event ID: 1022
Date: 3/30/2011
Time: 12:55:49 PM
User: N/A
Computer: MAIL
Description:
Logon Failure on database "First Storage Group\Mailbox Database" - Windows account x\dmui; mailbox /o=x/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Recip
Error: -1069
Client Machine: MAIL
Client Process: w3wp.exe
Client ProcessId: 0
Client ApplicationId: Client=WebServices;UserAge
For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Data:
0000: f5 87 97 61 0f 50 44 40 õ¿¿a.PD@
0008: bc d0 79 86 c7 6a 20 20 ¼Ðy¿Çj
0010: 5b 44 49 41 47 5f 43 54 [DIAG_CT
0018: 58 5d 00 00 6e 00 00 00 X]..n...
0020: ff 10 1b 00 00 00 00 00 ÿ.......
0028: 00 02 60 00 00 00 3a 67 ..`...:g
0030: f0 1f fe 00 00 00 71 5d ð.þ...q]
0038: 40 10 ec 03 00 00 d2 55 @.ì...ÒU
0040: 60 20 0f 01 04 80 40 00 ` ...¿@.
0048: 0c 68 d3 fb ff ff 71 5d .hÓûÿÿq]
0050: 40 10 ec 03 00 00 d2 55 @.ì...ÒU
0058: 60 20 0f 01 04 80 00 80 ` ...¿.¿
0060: 6f 67 ec 03 00 00 f1 5e ogì...ñ^
0068: 40 10 0f 01 04 80 b4 33 @....¿´3
0070: 40 10 d3 fb ff ff 07 08 @.Óûÿÿ..
0078: 40 10 d3 fb ff ff 07 0a @.Óûÿÿ..
0080: 40 10 d3 fb ff ff 97 08 @.Óûÿÿ¿.
0088: 40 10 d3 fb ff ff @.Óûÿÿ
Event Type: Warning
Event Source: MSExchange Search Indexer
Event Category: General
Event ID: 107
Date: 3/30/2011
Time: 12:55:44 PM
User: N/A
Computer: MAIL
Description:
Exchange Search Indexer has temporarily disabled indexing of the Mailbox Database First Storage Group\Mailbox Database (GUID = 95844283-e099-4487-9cce-af
Diagnostic context:
Lid: 1494 ---- Remote Context Beg ----
Lid: 13236 StoreEc: 0xFFFFFBD3
Lid: 3840 StoreEc: 0xFFFFFBD3
Lid: 27545 StoreEc: 0xFFFFFBD3
Lid: 14638 StoreEc: 0xFFFFFBD3
Lid: 1750 ---- Remote Context End ----
Lid: 12018 StoreEc: 0xFFFFFBD3
Lid: 9266 StoreEc: 0xFFFFFBD3
at Microsoft.Mapi.MapiExcepti
at Microsoft.Mapi.ExRpcAdmin.
at Microsoft.Exchange.Search.
at Microsoft.Exchange.Search.
For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Event Type: Warning
Event Source: MSExchangeIS
Event Category: Database Storage Engine
Event ID: 9786
Date: 3/30/2011
Time: 2:45:22 PM
User: N/A
Computer: MAIL
Description:
The database engine has consumed 94% of the "version store buckets" resource (15464 used out of a maximum of 16386) for storage group 'First Storage Group'.
For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Data:
0000: 56 45 52 42 VERB
ASKER CERTIFIED SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
Moving the users will work, but I can add some information as to why. I had this problem recently, and really struggled to understand what could be causing the version store to run out of memory. Note: for readability below I'm chopping out the unimportant details of the various log messages and errors.
- It was on a single role Mailbox Server so back pressure wasn't being triggered.
- There were some very large users on the database, but it was otherwise lightly loaded, with only 20 real users and about 15 service accounts, with plenty of free disk space.
- Nightly maintenance was completing without issue
- A few users were reporting getting "cannot open default folders" messages in outlook, but after a few minutes they were able to get back in.
- During these periods some NDR's were reported to mailboxes in the same DB.
- Also during these periods the application log on the server was flooded with 1022 logon errors from a variety of users and computers, persumably Outlook, OWA, and EAS connections etc...
The sheer number of 1022 errors (and a few unrelated errors) made it difficult to see some of the key indicators:
Event ID: 9786:
The database engine has consumed 80% of the "version store buckets" resource (13116 used out of a maximum of 16386) for storage group 'SG1'.
Event ID: 107 (Warning):
Exchange Search Indexer has temporarily disabled indexing of the Mailbox Database SG1\MDB1 (GUID = fkb5a2d0-53b3-4d41-8128-a0 a8003695d9 ) due to an error (Microsoft.Mapi.MapiExcept ionJetErro rVersionSt oreOutOfMe mory: MapiExceptionJetErrorVersi onStoreOut OfMemory: Unable to set CI watermark
Event ID: 1025: With verious messages related to the indexer.
Occasionaly when the support team tried directing users to OWA they would receive a similar message like:
Exception type: Microsoft.Mapi.MapiExcepti onJetError VersionSto reOutOfMem ory
After a little while the 1022 errors would subside, users were able to get in and the problem seemed to disappear. I contacted the first user who had reported the issue and remote controlled her PC. She was able get into outlook with no problem, but said it almost always freezes when she tries to search. We told her not to search for the rest of the day and let's and we'll rebuild the index that night to see if it helps.
The day after the index rebuild the problem occured again, but we spotted soemthing interesting. The first reports of the issue came in a few minutes after we had emailed that same user asking her to try searching!! It made sense because we hadn't seen any 1022 errors since we had told her to not search the day before. My suspicion was that it was this specific user's search that was provoking the issue. I set perfmon to monitor the "version buckets allocated" counter and asked the user to search again. The counter started climbing rapidly until it hit the 80% threshold where the counter itself hung. A few minutes later users started calling support again and the log finally started showing some of the messages.
That night, I moved that specific user off the database and the next day try as I might I could not reproduce the issue, even on her new server/DB. The problem was resolved!
A few additional observations:
- In the period after the counter hit 80% a variety of progressively worse front end symptoms were reported, until ultimately users couldn't get in at all with the "unabled to open default folders" message.
- Some users claimed they hadn't seen any issues during this period.
- Not all the errors described above were logged each time we were able to reproduce the problem.
- We weren't able to get 100% consistency reproducing the issue. In fact at 1 point we thought searching in a specific folder caused the issue. combined with the logging issue we weren't completely certain we had found the trouble spot, but we had enough that the MB move was a worth while exercise.
- The mailbox move didn't skip any corrupted items. I still believe something was corrupted in there, but I can't support the theory with the move logs.
If my expirience suggests anything it's that you may not have to move all the users. Moves are not always that easy especially when you have very big mailboxes users in online mode etc... As an alternative, you can look for the provactive action starting with patient zero, meaning the first user mentioned in the 1022 errors, or perhaps the first user who contacted the helpdesk. Maybe work your way backwards, until & if you're able to isolate the trouble spot.
Hope that helps.
- It was on a single role Mailbox Server so back pressure wasn't being triggered.
- There were some very large users on the database, but it was otherwise lightly loaded, with only 20 real users and about 15 service accounts, with plenty of free disk space.
- Nightly maintenance was completing without issue
- A few users were reporting getting "cannot open default folders" messages in outlook, but after a few minutes they were able to get back in.
- During these periods some NDR's were reported to mailboxes in the same DB.
- Also during these periods the application log on the server was flooded with 1022 logon errors from a variety of users and computers, persumably Outlook, OWA, and EAS connections etc...
The sheer number of 1022 errors (and a few unrelated errors) made it difficult to see some of the key indicators:
Event ID: 9786:
The database engine has consumed 80% of the "version store buckets" resource (13116 used out of a maximum of 16386) for storage group 'SG1'.
Event ID: 107 (Warning):
Exchange Search Indexer has temporarily disabled indexing of the Mailbox Database SG1\MDB1 (GUID = fkb5a2d0-53b3-4d41-8128-a0
Event ID: 1025: With verious messages related to the indexer.
Occasionaly when the support team tried directing users to OWA they would receive a similar message like:
Exception type: Microsoft.Mapi.MapiExcepti
After a little while the 1022 errors would subside, users were able to get in and the problem seemed to disappear. I contacted the first user who had reported the issue and remote controlled her PC. She was able get into outlook with no problem, but said it almost always freezes when she tries to search. We told her not to search for the rest of the day and let's and we'll rebuild the index that night to see if it helps.
The day after the index rebuild the problem occured again, but we spotted soemthing interesting. The first reports of the issue came in a few minutes after we had emailed that same user asking her to try searching!! It made sense because we hadn't seen any 1022 errors since we had told her to not search the day before. My suspicion was that it was this specific user's search that was provoking the issue. I set perfmon to monitor the "version buckets allocated" counter and asked the user to search again. The counter started climbing rapidly until it hit the 80% threshold where the counter itself hung. A few minutes later users started calling support again and the log finally started showing some of the messages.
That night, I moved that specific user off the database and the next day try as I might I could not reproduce the issue, even on her new server/DB. The problem was resolved!
A few additional observations:
- In the period after the counter hit 80% a variety of progressively worse front end symptoms were reported, until ultimately users couldn't get in at all with the "unabled to open default folders" message.
- Some users claimed they hadn't seen any issues during this period.
- Not all the errors described above were logged each time we were able to reproduce the problem.
- We weren't able to get 100% consistency reproducing the issue. In fact at 1 point we thought searching in a specific folder caused the issue. combined with the logging issue we weren't completely certain we had found the trouble spot, but we had enough that the MB move was a worth while exercise.
- The mailbox move didn't skip any corrupted items. I still believe something was corrupted in there, but I can't support the theory with the move logs.
If my expirience suggests anything it's that you may not have to move all the users. Moves are not always that easy especially when you have very big mailboxes users in online mode etc... As an alternative, you can look for the provactive action starting with patient zero, meaning the first user mentioned in the 1022 errors, or perhaps the first user who contacted the helpdesk. Maybe work your way backwards, until & if you're able to isolate the trouble spot.
Hope that helps.
Reboot the Mailbox Server, it should correct the problem with the version store growing too large.
That was my initial instinct, but my CIO wouldn't allow it on a whim. Rightfully he wanted a better understanding of the root cause. Ultimately, I was able to resolved the problem without a reboot, and still haven't rebooted the server. Given the scenario I had, and purely in retrospect, there's no reason to believe a reboot would've resolved it. BTW I admit if I had rebooted and if that fixed it I'd be content and never had done all that work. On the flip side, I wouldn't have learned anything either.
ASKER
Schedule an offline defrag and repair on the DB and then migrate the Mailboxes to 2 separate DB's so you can schedule maintenance without taking the entire company down