Link to home
Start Free TrialLog in
Avatar of Rob Samuel
Rob SamuelFlag for United Kingdom of Great Britain and Northern Ireland

asked on

Exchange Database backup failing with validation - time out and leaving VSS writer in Retryable error state

Hi,

For the past week or so we've been having major issues with our Exchange database backup failing, leaving the Exchange Writer in a “Retryable Error” state (there is one backup job for 2x Exchange databases which has always worked fine).

It always fails when the backup job is validating the databases. I’m able to restart the "Microsoft Exchange Replication" service, temporarily disable validation and re-run the backup job successfully but I need to resolve the issue to get validation working again.

I've included the errors I am getting when the job fails below:

30-Nov 03:11:09 EX10-F-10379 Validation failed for Exchange database “(Database name)”, reason: Timed Out
30-Nov 03:11:09 CTLG-I-05728 catalog created [VV]
30-Nov 03:11:10 BKUP-F-04110 job failed to complete [VV]
30-Nov 03:11:16 VSS -W-08381      writer Microsoft Exchange Writer (Exchange Replication Service): <STABLE> <WAIT_FOR_BACKUP_COMPLETE> <STABLE(0x800423f3 - VSS_E_WRITERERROR_RETRYABLE)>
30-Nov 03:11:16 VSS -E-05591           components from Microsoft Exchange Writer (Exchange Replication Service) - writer failed
30-Nov 03:11:16 EX10-I-08530 One or more Exchange server components failed validation. Please run ESEUtil to determine the source of the failure.
30-Nov 03:11:12 CAT -E-08056 backup aborted [VV]

In the event viewer on our Exchange server, we get the following event which seems to show that the shadow copy itself has completed successfully but the backup job has failed:

Information Store – (Database name) (1084) Shadow copy instance 2 completed successfully

Over the past few days, these are some of things that I have tried to no avail (in no particular order):

1 - Ran “ESEUTIL /k” on both database files and log files – all OK, no issues/errors
2 - Upgraded Exchange 2013 from SP1 to CU14
3 - Increased VSS storage limits to 20% of hard disk
4 - Disabled ‘shadow copies’ on My Computer, disk properties
5 - Run “vssadmin delete shadows /all”
6 - Ensured that no other backups are running the same time as the Exchange database backup
7 - Confirmed that the Exchange VSS writer is in a “no error” state after the file system backup which runs in plenty of time before the Exchange backup
8 - Disabled background maintenance on Exchange (and re-enabled it after a failed backup)
9 - Added the following registry keys to the Exchange server and Evault server:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\TcpWindowSize=256000
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\GlobalMaxTcpWindowSize=16777216
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\KeepAliveInterval=1000
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\KeepAliveTime=600000

I did originally think it might be to do with the 1am-5am default maintenance schedule in Exchange 2013 but I’ve read that this is redundant due to the 24 x 7 background maintenance that takes place in the background and the fact that you can’t change this timeframe.

However, the backup fails during the day (e.g. if I run the job at 10am or 6pm it still fails with no other backups taking place on our environment).

I’m starting to run out of ideas now so if anyone has any suggestions as to how I can get the Exchange database backup job working correctly that would be great.

I look forward to your replies and thanks in advance!
ASKER CERTIFIED SOLUTION
Avatar of harry for
harry for

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Rob Samuel

ASKER

Thanks for your comment Harry, I've just created a new backup job for our Archive database and touch wood it has completed and validated OK.

I'm just waiting for it to replicate to our backup provider then I will do the same for our main database.

I will post progress on how I get on as soon as the second job has finished replicating.