Link to home
Start Free TrialLog in
Avatar of waqqas31
waqqas31

asked on

SQL Server backups over the network fail with operating system error 32

Hello community,

I have Googled this topic to death and I cannot find the solution to my problem.  Here is my setup:

Database Server 1:
SQL Server 2014 Developer
WIndows Server 2012 R2 Std

Backup Server 1:
SQL Server 2014 Developer
Windows Server 2012 R2 Std

Backup Server 2:
SQL Server 2014 Developer
Windows Server 2016 Std Eval

On Database Server 1, I have 3 databases: DB1, DB2, DB3.

I have created the following script to back each one up (one-at-a-time) to one of the backup servers:

Mapping Drives:
exec xp_cmdshell 'net use P: /delete'
GO
exec xp_cmdshell 'net use Q: /delete'
GO
exec xp_cmdshell 'net use P: \\192.168.0.200\Backup_F'
GO
exec xp_cmdshell 'net use Q: \\192.168.0.200\Backup_G'
GO
exec xp_cmdshell 'dir P:'
GO
exec xp_cmdshell 'dir Q:'
GO

Open in new window


Actual Backup:
DECLARE 
@today nvarchar(20),
@DB_to_backup nvarchar(128) = 'DB1',
@PATH_FILE1 nvarchar(128) = 'P:\SS_BKP1',
@PATH_FILE2 nvarchar(128) = 'Q:\SS_BKP2',
@backup_type nvarchar(128) = 'diff' /* Specify 'full' or 'diff' */

SELECT @today = SUBSTRING(REPLACE(REPLACE(REPLACE(CONVERT (datetime2,GETDATE(),120),'-',''),':',''),' ','-'),0,16)

DECLARE 
@DISK1 nvarchar(256) = @PATH_FILE1 + '\' + @DB_to_backup + '\' + @DB_to_backup + '_' + @today + '_' + @backup_type + '_MTS4Mb_BufferCount22_file1_of_2.bak',
@DISK2 nvarchar(256) = @PATH_FILE2 + '\' + @DB_to_backup + '\' + @DB_to_backup + '_' + @today + '_' + @backup_type + '_MTS4Mb_BufferCount22_file2_of_2.bak'

PRINT @DISK1
PRINT @DISK2

DECLARE 
@BACKUPSETNAME NVARCHAR(256) = N'' + @DB_to_backup + '-' + @backup_type + '-' + @today + '-MTS4Mb_BufferCount22'

IF @backup_type = 'diff'
	BACKUP DATABASE @DB_to_backup TO  
	DISK = @DISK1,
	DISK = @DISK2
	WITH DIFFERENTIAL, COMPRESSION, NOFORMAT, NOINIT,  NAME = @BACKUPSETNAME, 
	SKIP, NOREWIND, NOUNLOAD,  STATS = 1,
	MAXTRANSFERSIZE=4194304,
	BUFFERCOUNT=22;

if @backup_type = 'full'
	BACKUP DATABASE @DB_to_backup TO  
	DISK = @DISK1,
	DISK = @DISK2
	WITH COMPRESSION, NOFORMAT, NOINIT,  NAME = @BACKUPSETNAME, 
	SKIP, NOREWIND, NOUNLOAD,  STATS = 1,
	MAXTRANSFERSIZE=4194304,
	BUFFERCOUNT=22;

Open in new window


When I run this code, whether from a query window or from within a Maintenance Plan, the backup for DB1 fails midway (at any given percent) yet the backups for DB2 and DB3 always succeed.

This is what I will see in the Event Viewer for DB1 when it fails:

BackupIoRequest::ReportIoError: write failure on backup device 'P:\SS_BKP1\DB1\DB1_20180629-124841_diff_MTS4Mb_BufferCount22_file1_of_2.bak'. Operating system error 32(The process cannot access the file because it is being used by another process.).

Open in new window


System 

  - Provider 

   [ Name]  MSSQLSERVER 
 
  - EventID 18210 

   [ Qualifiers]  49152 
 
   Level 2 
 
   Task 6 
 
   Keywords 0x80000000000000 

Open in new window


When I Google for solutions to this error, everyone suggested to look for whichever process is accessing the same file when the backup aborts.  While I have the required programs (procexp.exe and handle.exe from Sysinternals), the problem is that as soon as the backup fails, the partial files automatically get deleted, so i cannot check anything regarding them, let alone locks and handles from other processes.

However, I started the backup again and queried for handles (on Backup Server 1) while it was still running.  The results were hardly heartening or surprising.  I ran:

handle.exe -a > C:\current_handles.txt

And searched the resulting dump file for the folder where the said backups were supposed to save, and I found these open handles under these processes:

System pid: 4 \<unable to open process>
  B78: File  (---)   F:\Backup\SS_BKP1\DB1\DB1_20180629-145347_diff_MTS4Mb_BufferCount22_file1_of_2.bak
 1294: File  (---)   F:\Backup\SS_BKP1\DB1\DB1_20180629-145347_diff_MTS4Mb_BufferCount22_file1_of_2.bak
------------------------------------------------------------------------------
explorer.exe pid: 9120 NETWORK\Administrator
  604: File  (RWD)   F:\Backup\SS_BKP1
 14E8: File  (RWD)   F:\Backup\SS_BKP1

Open in new window


I think it's safe to say I can't KILL these processes to free up those handles.

Other things I've tried to prevent any potential locks or conflicts was to disable Volume Shadow Copies on the Backup Servers as well as the Backup Server's local SQL Server instances.

The backups still fail.

I also modified my script to save on the local hard drive instead of the backup server and that did NOT fail, however, that is not practical on an ongoing basis.

I manage 15 database servers with 65 databases.  Only 2 of the servers are giving me this headache, and of those two, 1 is just partially failing.

Any and all help would be greatly appreciated!
Avatar of ste5an
ste5an
Flag of Germany image

First of all: get rid of the mappings. Use UNC paths instead. And you need to run the backup under an Windows account which has network permissions. The dafault installation of SQL Server uses Local System, which has none.
Avatar of waqqas31
waqqas31

ASKER

Hi @Ste5an,

I can change the mappings once this issue is resolved.

As far as accounts go, all SQL services (and everything else for that matter) are being run under the main domain administrator account, namely, NETWORK\Administrator, in my case.
hmm, what kind of network drive is it? For some reason, it seems that SQL Server cannot access the file exclusivley. Did you check, that there is no AV active on BAK files?
The only AV-type software on the Backup Server is Windows Defender, but there are literally 7-8 other servers that are doing the exact same thing with that same Backup Server and having no issues whatsoever.  I will, however, see if Windows Defender is suspecting anything with that particular folder or its contents (I hope it's intuitive enough to discover.)
I've had issues with SQL backups to mapped drives and the only resolution was to use UNC path instead of mapping.  I'd suggest you try that before going too much farther.  You seem to have addressed permissions and accessibility so the unc vs mapped drive is the only big issue I see that is an issue.
Thanks, John.  I will give that a try.
Hi,

I have seen thus behaviour where network connectivity wasnt ideal.
Maybe it sounds silly but start by renewing the cables, if possible use another switchport.

Cheers.
Sadly, even using UNC paths did not make a difference.

One thing that I did notice, however, was that disabling the SQL Server VSS Writer service let the backup run for longer, even though it still ended up failing.  The backup would run for less than 10 minutes with the service enabled versus 30 with it disabled.

Also, it should be pointed out that small backups are completing (~15GB total) whereas the bigger ones (600-1000GB) fail (and quite early, before the 100GB mark.)
>> was that disabling the SQL Server VSS Writer service let the backup run for longer, even though it still ended up failing.  The backup would run for less than 10 minutes with the service enabled versus 30 with it disabled.

Disabling VSS writer service shouldn't have any impact to this..

>> Also, it should be pointed out that small backups are completing (~15GB total) whereas the bigger ones (600-1000GB) fail (and quite early, before the 100GB mark.)

Kindly check whether you are able to take backup of that particular bigger database to local drive successfully or not.
If so, then try running the backup command from query window which can display additional messages about the failure.
Hi Raja,

1. Yes, I'm able to backup locally.
2. When I run the same script from a query window, the error I receive is the exact same as what I posted in the original question, namely, mid-backup I can no longer access the backup file because it is being used by another process.
SOLUTION
Avatar of Raja Jegan R
Raja Jegan R
Flag of India image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Raja,

Thank you for the tip.  I will try and see if I can identify any network blips by looking in the switch logs, unless you can recommend investigating elsewhere.
Yes, you can check your Switch logs along with ping command in parallel to identify any packet losses.
Wanted to keep this alive while I continue to troubleshoot.
sure, just keep us updated on your findings as well..
Thank you for your suggestions and patience.  Keeping this alive while I run more tests.
Hi waqqas31,

Its been a quite long time and a feedback would be appreciated..
kindly request you to close the question accordingly.
Raja,

I appreciate your patience.  I will provide an update later today on what I have discovered.
I am back to report that I have found the problem and the solution.

After barking up every tree (except the one with the brightest leaves), I did a landscape-wide comparison of NIC firmware and driver versions, and updating the target backup server's drivers to the latest version has done the trick.  One would think that something as simple as file-transfers is something a company that manufactures NICs would always get right, but it just goes to show that one should never ass-u-me.

For those interested, I am mentioning the NIC card details and the Release Notes from the driver update.

I will close and award points.  Thank you everyone for your input.

NIC make and model:  Mellanox ConnectX-3 Pro 40G VPI Infiniband Network Adapter
Mellanox SKU: MCX354A-FCBT
Dell SKU: 06RKNM
HPE SKU: 649281-B21

Problematic driver release: 5.35.12970.0 (and possibly 5.35.12978.0)
Updated driver release installed: 5.50.14643.0
Release Notes for 5.50.14643.0:  WinOF_VPI_Release_Notes_5.50-1-.pdf  http://www.mellanox.com/related-docs/prod_software/WinOF_VPI_Release_Notes_5.50.pdf
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thank you for your help, experts!