Link to home
Start Free TrialLog in
Avatar of marrowyung
marrowyung

asked on

Log shipping job failed as one trn file is not in good format, how to fix it asap ?

hi,


on a log shipping job on SQL server 2016 standard I see error:


Message
2021-08-30 17:15:16.50   *** Error: An error occurred during the restore operation for secondary database 'ghk_booking_archive'.(Microsoft.SqlServer.Management.LogShipping) ***
2021-08-30 17:15:16.50   *** Error: The media family on device 'T:\LogShipDest\ghk_booking_archive_20210817134500.trn' is incorrectly formed. SQL Server cannot process this media family.
RESTORE LOG is terminating abnormally.(.Net SqlClient Data Provider) ***

it seems this files ,    ghk_booking_archive_20210817134500.trn , is corruped by some reason or what ?


actually from my point of view, we need to do:

1) stop log shipping job on both primary and standby server.

2) full restore the LS target DB, and 

3) just simply kick off the LS job again to fix it, 


is there any other method to be easy than this, just one file has problem... 




Avatar of arnold
arnold
Flag of United States of America image

Potentially, the file is missed and its out of sync, only way is to restore a full backup, and get it back in sync.

Alternatively, finding the transaction logs that preceded thus one and try to restore them, to get it back in sync.


You've already thirteen days behind.  Do you still have the transaction log from the 17th?

Possibky the copying might not completed, leading to file corruption.
Avatar of marrowyung
marrowyung

ASKER

Potentially, the file is missed and its out of sync,

the file is found and I read sth like that:
https://docs.microsoft.com/en-us/troubleshoot/windows-server/backup-and-storage/database-backup-error-3266-3013

seems all about very accidental file corruption.

Alternatively, finding the transaction logs that preceded thus one and try to restore them, to get it back in sync.

tried that and it is the same error message.

Possibky the copying might not completed

file found in the files share

only way is to restore a full backup, and get it back in sync.

so now only reconfiguration can help:

1) stop respective LS job on LS primary and target
2) full backup and restore of the DB to LS target and  leave it in none recovery mode
3) start the respective LS job on LS primary and target
?


Yes, you have to manually get it back to sync.

Seems odd that 13 days later an error is detected.

How async  is your log shipping?
Yes, to quickly restore you would need to Reconfigure the Log Shipping with the steps you have mentioned above..
To avoid these kind of issues out, kindly configure your database mail to alert in case of any Log Shipping related job failures to handle it quickly instead of verifying after 13 days.

To identify the reason for your Backup file corruption, kindly let us know whether you are taking backup directly to Network shares or to your local drives.
Log shipping commonly is automated.

usually there are a combination of three of four jobs. a pair on the source server and a pair on the destination.
The source server generate a transaction log.
notifies the others
the other copies the files, and restarts with no recovery.

if there is some file corruption during transfer, it will error on the restore.

an alert on failure might have helped. Commonly the transaction logs are cleaned up to manage space.
arnold ,

How async  is your log shipping?
it works for a long long time.

usually there are a combination of three of four jobs. a pair on the source server and a pair on the destination.
The source server generate a transaction log.
notifies the others
the other copies the files, and restarts with no recovery.
there are job in the target to copy and restore the log, it seems do not notisify each other.


Raja Jegan ,

To avoid these kind of issues out, kindly configure your database mail to alert in case of any Log Shipping related job failures to handle it quickly instead of verifying after 13 days.

we have this alert failure ONLY to my mailbox. ahaha

To identify the reason for your Backup file corruption, kindly let us know whether you are taking backup directly to Network shares or to your local drives.

network share... but it seldomly happen but now.






today I have reconfigure the LS and it seems it is only working for 1 hour after LS setup again and error is :

Date      9/1/2021 5:45:00 PM
Log      Job History (LSRestore_HISDB01\HISDB_gxxxxxx)

Step ID      1
Server      BIDB01
Job Name      LSRestore_HISDB01\HISDB_xxxxx
Step Name      Log shipping restore log job step.
Duration      00:00:27
Sql Severity   0
Sql Message ID   0
Operator Emailed   
Operator Net sent   
Operator Paged   
Retries Attempted   0

Message
2021-09-01 17:45:26.99   *** Error: Could not apply log backup file 'T:\LogShipDest\xxxxarchive_20210901084501.trn' to secondary database 'ghk_booking_archive'.(Microsoft.SqlServer.Management.LogShipping) ***
2021-09-01 17:45:26.99   *** Error: Exclusive access could not be obtained because the database is in use.
RESTORE LOG is terminating abnormally.(.Net SqlClient Data Provider) ***
2021-09-01 17:45:27.03   *** Error: The log backup file 'T:\LogShipDest\xxxxx_20210901084501.trn' was verified but could not be applied to secondary database 'xxxxx'.(Microsoft.SqlServer.Management.LogShipping) ***
2021-09-01 17:45:27.03   Deleting old log backup files. Primary Database: 'ghk_booking_archive'
2021-09-01 17:45:27.05   The restore operation completed with errors. Secondary ID: 'e0cd276d-803a-453a-8ecc-c99e92e5fd16'
2021-09-01 17:45:27.05   ----- END OF TRANSACTION LOG RESTORE     -----

Exit Status: 1 (Error)


Open in new window


and when I try to restore the log , myself I see this

===================================

Restore failed for Server '<LS target>'.  (Microsoft.SqlServer.SmoExtended)

------------------------------
For help, click: https://go.microsoft.com/fwlink?ProdName=Microsoft+SQL+Server&ProdVer=16.100.37971.0&EvtSrc=Microsoft.SqlServer.Management.Smo.ExceptionTemplates.FailedOperationExceptionText&EvtID=Restore+Server&LinkId=20476

------------------------------
Program Location:

   at Microsoft.SqlServer.Management.Smo.Restore.SqlRestore(Server srv)
   at Microsoft.SqlServer.Management.SqlManagerUI.SqlRestoreTLogOptions.RunRestore()

===================================

System.Data.SqlClient.SqlError: Exclusive access could not be obtained because the database is in use. (Microsoft.SqlServer.Smo)

------------------------------
For help, click: https://go.microsoft.com/fwlink?ProdName=Microsoft+SQL+Server&ProdVer=16.100.37971.0&LinkId=20476

------------------------------
Program Location:

   at Microsoft.SqlServer.Management.Smo.ExecutionManager.ExecuteNonQueryWithMessage(StringCollection queries, ServerMessageEventHandler dbccMessageHandler, Boolean errorsAsMessages, Boolean retry)
   at Microsoft.SqlServer.Management.Smo.BackupRestoreBase.ExecuteSql(Server server, StringCollection queries)
   at Microsoft.SqlServer.Management.Smo.Restore.SqlRestore(Server srv)

Open in new window


so it failes again.

what is the idea on this? why locked and not for log restore?

what status is the DB in to which you are restoring?
it is in recovery, correct?

how and what are you trying to restore?

Are you copying the transaction log to the destination server and then restoring trying to restore it?

check the size of the transaction log on the source server and compare it to the size on the destination server.  are they the same?
what status is the DB in to which you are restoring?
it is in recovery, correct?

standby ! if it is recovery LS is not going to work as no more log can be restore.
we have to change it to standby or none recoverable , LS will change none recoverable to standby when LS start

how and what are you trying to restore?

the DB need to log shipping.

Are you copying the transaction log to the destination server and then restoring trying to restore it?

the SQL LS job will do it for me and what I did twice today is to restore the log specified in the error log, and I see another warning after getting LS to work for 1 hours, then it is not working AGAIN and I post the second error.

so now I setup again.

check the size of the transaction log on the source server and compare it to the size on the destination server.  are they the same?

good point ! but it is the same size.

ASKER CERTIFIED SOLUTION
Avatar of arnold
arnold
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Are you sure?

standby means it is locked, read-only no changes can be made.

yes, sure.

can't see why. and I have to recreate the log shipping again to make it works for the whole day.

now I found out what is left is the LS alert job on PRIMARY fail with error even all LS backup, copy and restore jobs works back on! any reason for it ?

are you using it to create a snapshot to query for some data ?
yes!


arnold ,

https://www.mssqltips.com/sqlservertip/3574/change-sql-server-log-shipped-database-from-restoring-to-standby-readonly/

but this only happen during the beginning of the process when LS start to run, right?

Let's see here sth funny

User generated image
we see one alert here but it is a wrong item think, as the log shipping from PRIMARY is seeing itself also the secondary DB.

The first 2 line is real and running good and exactly what I want.

I think when I script out the LS configuration and by some reason I don't know why, applied to the PRIMARY to create the LS target on PRIMARY but actually the secondary DB NEVER on primary.

anyway to remove that LS configuration entry from LS system table so that Alert job do not check that WRONG item ?

from the link however , give me a hints, this query is to check log shipping on SECONDARY:

SELECT
   secondary_server,   secondary_database,   primary_server,   primary_database,   last_copied_file,   last_copied_date,   last_restored_file,   last_restored_date
FROM msdb.dbo.log_shipping_monitor_secondary

Open in new window


SELECT
   secondary_database,
   restore_mode,
   disconnect_users,
   last_restored_file
FROM msdb.dbo.log_shipping_secondary_databases

Open in new window


and I run it on PRIMARY it exactly as the entry i don't want ! do should I just drop the only one record in 
msdb.dbo.log_shipping_monitor_secondary 

Open in new window

and

msdb.dbo.log_shipping_secondary_databases

Open in new window



on primary ?




What puts the secondary db in standby ?
What puts the secondary db in standby ?

the log shipping process.

but right now the final problem is, why the primary is consider to be a secondary for 1 x log shipping process!

I don't set it. how can I remove that LS configuration? just delete the entry on both LS system table above ? 
You terminate the log shipping arrangement under the DB properties.

I do not believe the log shipping job place the DB in standby.

You likely have another job that is possibly part of a reporting process. that as an example creates a read-only snapshot of the remote DB and performs some data mining. it possible termed and left the DB in the standby mode.

going the delete route, you may have to go through the cleanup to remove each reference that you will run into when you try to reestablish a log shipping setup. it would likely complain that an entry already exists ....
You terminate the log shipping arrangement under the DB properties.
yes 


I do not believe the log shipping job place the DB in standby.

Can't see why as I said. and it left a row in the PRIMARY DB secondary log shipping table

. it possible termed and left the DB in the standby mode.

nonon , no Db in standby mode any more, my last statement is, the LS Alert in PRIMARY is not normal, it complain a LS process doesn't exists at all, so it keep complaining DB out sync.

the secondary LS DB is not in PRIMARY at all, but primary DB think there is one. so keep reporting out sync.
so my last question is, on the system LS secondary DB table on primary I found an entry, can I simple just remote it and make the primary DB LS Alert job heathy ?

you may have to go through the cleanup to remove each reference that you will run into when you try to reestablish a log shipping setup
you mean when I try to delete an entry on each of the table I found above, I might prompt with error message like foreign key, and I have to follow the message to find related record in all related table ?
With log shipping when manually deleting jobs while the configuration still reflects a relationship..

that is a possibility.
Your error indicates that the LS process attempted restore but could not get exclusive access. In Standby, users can connect to the database and if the database connections are fast enough, when the LS process kills the connections, one can sneak in before the LS process starts the restore.  If you cannot get exclusive access to the database (LS Process) then you will get the error that it cannot proceed with the restore because it cannot get exclusive access to take the database out of Standby mode.

You need to find a time when not as many connections are hitting the database or create a custom process to restore so that you can control it better.
Ben,
If you cannot get exclusive access to the database (LS Process) then you will get the error that it cannot proceed with the restore because it cannot get exclusive access to take the database out of Standby mode.

I think this error comes up when, by some reason, it can't change from standby mode to none recovery mode to restore t log.

right now this problem is solved but left the one of the mistaken (created I guess by script during the time I am debugging LS) , the LS job in the target is created in primary, how can I remove it so that the LS_alert job in primary is not in error anymore ?

You need to find a time when not as many connections are hitting the database or create a custom process to restore so that you can control it better.

after I RECREATE the log shipping, this error is gone!


BTW, how can I remove the alert for this residue LS job ?
you have to search for the job in the and remove the reference.

Double check whether it is on the local server or the error is coming from the remote side.

Is this an email notification?

Check the message headers to determine source.
Is this an email notification?
I read from job history

you have to search for the job in the and remove the reference.

not just in the 2 x LS system table I shown above?

AFTER tell you all the LS job back to life and the same second problem comes back on, the DB is locked and can't restore log:

2021-09-06 12:30:21.18   *** Error: Could not apply log backup file 'T:\LogShipDest\<Database>_20210906040001.trn' to secondary database '<Database>'.(Microsoft.SqlServer.Management.LogShipping) ***
2021-09-06 12:30:21.18   *** Error: Exclusive access could not be obtained because the database is in use.
RESTORE LOG is terminating abnormally.(.Net SqlClient Data Provider) ***

any way to check why, from time to time, the DB locked from LS log restore process?

I just tried to run this:

EXEC sp_change_log_shipping_secondary_database
  @secondary_database = '<LS DB name>',
  @restore_mode = 0,   -- 0 is restore mode, --1 is standby/read only mode
  @disconnect_users = 1

and LS RESTORE job can be run! but if I change @restore_mode = 1, it can't change to standby/read-only mode AGAIN.

so finally still have to re setup the log shipping !

the above just is a temporary solution only and for me  to troubleshoot it when it happen again, please let me know how to successfully change it to read only mode.
The error is coming from the secondary to the prinary, status update.
sorry ,say that again? what you want me to show yo u?

any idea on how to remove the god damn LS secondary record in PRIMARY  sever?
If you have a way to query history of a job, that means the job/reference is there.


https://docs.microsoft.com/en-us/sql/database-engine/log-shipping/remove-log-shipping-sql-server?view=sql-server-ver15

Use the stored procedures to clear log shipping references for these two db.
seems this :

On the log shipping secondary server, execute sp_delete_log_shipping_secondary_database to delete the secondary database.

can do the job and any parameter it use ?

so just exec sp_delete_log_shipping_secondary_database  ?
on our primary when I run this:

SELECT
   secondary_database,
   restore_mode,
   disconnect_users,
   last_restored_file
FROM msdb.dbo.log_shipping_secondary_databases


only one row returned and that DB is right ! the one said it is the LS secondary DB but it is not on primary.
In order to gave a history event, you need a job that corresponds to it.

The log shipping is an event based process.
1)generate transaction log
2) notify the secondary
On the secondary,
3) copy the log from the share
4) attemp restore with no revovery
5) report back status.

This is why it is a pain to not sever the log shipping relationship .

If you can access the properties of the DB (primary) navigate to the log shipping, and sever the setup.
"This is why it is a pain to not sever the log shipping relationship . "

I think I accidently run the Ls configuration script and it create the PRIMARY server as LS secondary, so it left an entry there.
If you can access the properties of the DB (primary) navigate to the log shipping, and sever the setup.

I did, the item is not there.

so that one is residue LS configuration which make primary has a record that it also a LS secondary server, so I have to remove it.
arnold ,

any update for me ?

On the log shipping secondary server, execute sp_delete_log_shipping_secondary_database to delete the secondary database.

can do the job and any parameter it use ?

so just exec sp_delete_log_shipping_secondary_database  ? on our primary right ?
That is a possibility to delte. the identification of the source of the notice is the pain.
confirm the jobs that you have on the secondary

If you try to setup the log shipping again, it usually will tell you where the conflict  is.
The presence of the job on the server from which you can pull history, try and see what the steps of the job are
to try and work your way back.
I.e. what the job trigeres.

if you have a history placeholder for a job, the job is still existing.
once you clear up the above, disable the job and see if the notification, errors continue.
That is a possibility to delte. the identification of the source of the notice is the pain.
if I check out the source it will have more than one as the server ALREADY the LS source of few databases,

not very accurate.

If you try to setup the log shipping again, it usually will tell you where the conflict  is.

I tried that before, no conflict of error.

if you have a history placeholder for a job, the job is still existing.
anyway to check?  it is only left in the LS alert job, no in the job list.

I run this on the primary:

User generated image
and the entry is gone and LS job and LS alert job still work good.