Rachel Flewelling
asked on
Exch2003 Event 1159 - Centralized server reaches Transaction limit and Dismounts stores, only during Full backup.
Windows 2003 Enterprise
Exchange SP2
Hello, I've searched for this even in EE, and I understand what causes it, however I have a twist to the problem:
The following has occured twice to us recently:
-A full backup is running on our Exchange 2003 server.
-event ID 1159 is logged and Stores in one Storage Group go offline:
Database error 0xfffffd9a occurred in function JTAB_BASE::EcUpdate while accessing the database "SG3\PVT10".
Both of our incidents occurred when the following was happening:
First time: we were moving a lot of mailboxes, and a Full backup was running.
Second time: It was a busy day for Exchange activity in general, a Full backup was running.
This article, although dated for Exch 2000 explains it: http://support.microsoft.com/kb/905801
Also in that document it points to this link which gives a tad more information: http://support.microsoft.com/kb/819771/
In summary, the stores go offline when the number of uncommitted transaction logs in one Storage Group reach 1008 files.
Here is the thing, in our environment, we have had backup issues in the past where our transaction logs did not commit for days. For us this can amount to over 80gigs of transaction logs (we have a lot of disk space for transaction logs). The total number of transaction logs are definitely in the thousands for each of our 4 stores. The Mailbox Stores DO NOT go offline when this happens. everything is fine, we just fix the full backup and get those logs committed.
The two occurrences of this event 1159 happened when a FULL backup is in progress, AND the number of transaction logs exceeds 1008. I followed the backup progress in the event log from this past weekend before the last event 1159 and it is progressing fine. It just did not seem to complete before the store went offline because the amount of transaction log files created during this backup period breached the limit.
I'm looking for anyone with experience in this issue using Exchange 2003, since the articles I find are for Exch 2000.
To clarify,
Which one of the following statements are true?
1. While an Exchange Storage Group backup is in progress, regardless of how many uncommitted Transaction Logs there are in total for that Storage Group, there is a limit to how many uncommitted Transaction logs can be produced while the backup is in progress. That limit is 1008.
2. There is a limit to how many uncommitted Transaction Logs in total for each Storage Group. That limit is 1008.
Because of the size of our centralized environment, full backups can take 3 days to complete an entire Exchange server, maybe 14hours per Storage Group.
Thanks for any comments, suggestions.
Exchange SP2
Hello, I've searched for this even in EE, and I understand what causes it, however I have a twist to the problem:
The following has occured twice to us recently:
-A full backup is running on our Exchange 2003 server.
-event ID 1159 is logged and Stores in one Storage Group go offline:
Database error 0xfffffd9a occurred in function JTAB_BASE::EcUpdate while accessing the database "SG3\PVT10".
Both of our incidents occurred when the following was happening:
First time: we were moving a lot of mailboxes, and a Full backup was running.
Second time: It was a busy day for Exchange activity in general, a Full backup was running.
This article, although dated for Exch 2000 explains it: http://support.microsoft.com/kb/905801
Also in that document it points to this link which gives a tad more information: http://support.microsoft.com/kb/819771/
In summary, the stores go offline when the number of uncommitted transaction logs in one Storage Group reach 1008 files.
Here is the thing, in our environment, we have had backup issues in the past where our transaction logs did not commit for days. For us this can amount to over 80gigs of transaction logs (we have a lot of disk space for transaction logs). The total number of transaction logs are definitely in the thousands for each of our 4 stores. The Mailbox Stores DO NOT go offline when this happens. everything is fine, we just fix the full backup and get those logs committed.
The two occurrences of this event 1159 happened when a FULL backup is in progress, AND the number of transaction logs exceeds 1008. I followed the backup progress in the event log from this past weekend before the last event 1159 and it is progressing fine. It just did not seem to complete before the store went offline because the amount of transaction log files created during this backup period breached the limit.
I'm looking for anyone with experience in this issue using Exchange 2003, since the articles I find are for Exch 2000.
To clarify,
Which one of the following statements are true?
1. While an Exchange Storage Group backup is in progress, regardless of how many uncommitted Transaction Logs there are in total for that Storage Group, there is a limit to how many uncommitted Transaction logs can be produced while the backup is in progress. That limit is 1008.
2. There is a limit to how many uncommitted Transaction Logs in total for each Storage Group. That limit is 1008.
Because of the size of our centralized environment, full backups can take 3 days to complete an entire Exchange server, maybe 14hours per Storage Group.
Thanks for any comments, suggestions.
ASKER
sandeep,
Thanks for your comments,
I do know and do see the purging of committed logs to the database.
Your last sentence, can you explain further? Do you mean some backup software products for Exchange will lock the checkpoint file, not allowing logs to be committed? Because I do see the logs that existed UP TO the beginning of the Storage Group backup are committed and purged.
What I'm trying to clarify for myself and need help with, is that from the time the backup starts on a certain Storage Group, if 1008 new transaction logs are generated, Exchange takes the Mail Stores offline.
What does locking the checkpoint file do (if that is even happening to me), if exchange can write new transaction logs just fine during the backup, and I see no errors with diagnostic logging turned up high.
What is interesting is I can put the Mail Stores back online. I can also run a full backup and the logs will commit and purge.
The conclusion that I am coming to is a hard one for me to swallow. The fact that during a Full Backup of our Centralized Exchange 2003 Server, I can only produce 5040MB of new transaction logs (5,120kb per log) during that Full Backup Period, or my Mail Stores will go offline?
Thanks for your comments,
I do know and do see the purging of committed logs to the database.
Your last sentence, can you explain further? Do you mean some backup software products for Exchange will lock the checkpoint file, not allowing logs to be committed? Because I do see the logs that existed UP TO the beginning of the Storage Group backup are committed and purged.
What I'm trying to clarify for myself and need help with, is that from the time the backup starts on a certain Storage Group, if 1008 new transaction logs are generated, Exchange takes the Mail Stores offline.
What does locking the checkpoint file do (if that is even happening to me), if exchange can write new transaction logs just fine during the backup, and I see no errors with diagnostic logging turned up high.
What is interesting is I can put the Mail Stores back online. I can also run a full backup and the logs will commit and purge.
The conclusion that I am coming to is a hard one for me to swallow. The fact that during a Full Backup of our Centralized Exchange 2003 Server, I can only produce 5040MB of new transaction logs (5,120kb per log) during that Full Backup Period, or my Mail Stores will go offline?
ASKER
Correction to a sentence:
"What is interesting is I can put the Mail Stores back online. I can also run a full backup and the logs will commit and purge. "
=
"What is interesting is I can put the Mail Stores back online. I can also run a full backup and the logs will commit and purge. I just need to make sure I don't have a lot of transaction log activity over the course of the full backup on that Storage Group."
"What is interesting is I can put the Mail Stores back online. I can also run a full backup and the logs will commit and purge. "
=
"What is interesting is I can put the Mail Stores back online. I can also run a full backup and the logs will commit and purge. I just need to make sure I don't have a lot of transaction log activity over the course of the full backup on that Storage Group."
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thank you Sandeep,
My brain is almost done catching up to my imagination :) One more related question,
You mentioned you encountered with with your backup product?.
Was this an option in your backup product?
What were you using?
TSM?
Did you have to switch backup products or just reconfigure it in the Backup application?
Do all backup products lock the checkpoint file or is this an optional thing? (If it's for integrity reasons I guess not, but this would be a major downside to centralized installations such as mine so there must be a way to get around this).
My brain is almost done catching up to my imagination :) One more related question,
You mentioned you encountered with with your backup product?.
Was this an option in your backup product?
What were you using?
TSM?
Did you have to switch backup products or just reconfigure it in the Backup application?
Do all backup products lock the checkpoint file or is this an optional thing? (If it's for integrity reasons I guess not, but this would be a major downside to centralized installations such as mine so there must be a way to get around this).
we are using Veritas, though the problem was being Backup software locking the checkpoing file, we discovered the backup process would get hung due to low resources causing this entire sequence to happen.
Since , we hvae corrected many performance related issues inclusing increase in Physical Memory, it has been 3 weeks that we havent had another occurance of the issue..
when we contacted Microsoft they were on the same lines as we were in suggestions on this issue
Since , we hvae corrected many performance related issues inclusing increase in Physical Memory, it has been 3 weeks that we havent had another occurance of the issue..
when we contacted Microsoft they were on the same lines as we were in suggestions on this issue
ASKER
This is amazing.
Our system is running great, but our Stores are so large that backing them up takes too long and/or there is too much activity in our centralized environment, which causes this incident to occur.
So either:
A) there needs to be less activity during the full backup period
B) we need to ensure the backup runs faster. It already backs up over a dedicated 1gig NIC to the backup system but we will investigate.
Thank you for your time and experience Sandeep.
Dear Microsoft, increase this 1008 limit, because the ideal of a centralized environment almost requires it to be increased. ;)
Our system is running great, but our Stores are so large that backing them up takes too long and/or there is too much activity in our centralized environment, which causes this incident to occur.
So either:
A) there needs to be less activity during the full backup period
B) we need to ensure the backup runs faster. It already backs up over a dedicated 1gig NIC to the backup system but we will investigate.
Thank you for your time and experience Sandeep.
Dear Microsoft, increase this 1008 limit, because the ideal of a centralized environment almost requires it to be increased. ;)
ASKER
Thank you! The KB article was not clear. You were.
backups ( full & incremental) will purge commited log files to your database.
I faced this problem & after a thourough investigation found out that our backup software will lock the checkpoint file before starting the backup & does not unlock the checkpoing file while do not allow the transaction logs to commit to the database.