We help IT Professionals succeed at work.

VDR Failing to backup

hongedit
hongedit asked
on
2,234 Views
Last Modified: 2012-05-11
Hi

I am having an issue with VDR. On several of my VM's, the backup fails with:

Failed to create snapshot for Saints DC, error -3941 ( create snapshot failed).

Points to note:

1. VDR has been rebooted several times
2. Disks have been formatted with 1MB blocks, but backups fail on VM/Disks with less than 256GB
3. Trying to snapshot a problem VM manually without memory but with quiesce ticked also fails
4. Backups work on "simple" VM's - standalone servers with no additional disks etc
5. I dont know if this is coincidence but all the VM's it is failing on have multiple disks. Even if I select only to backup the OS disk.
6. Backups fails on locally mounted disk or network share
Comment
Watch Question

Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
Okay, the issue is the Snapshot failure. If the snapshot cannot proceed correctly, the vDR Backup job will fail.

Author

Commented:
Ok. So what should I be looking at?
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
As you have confirmed by trying to snapshot the machine with Memory Unticked, but Quiesce ticked, is the snapshot function with vDR triggers. If the snapshot cannot quiecse the virtual machine, the vDR job will fail.

The issue may reside with VMware Tools in the virtual machine.
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
Check the virtual machine event log, do you have any errors, around the time of the error (backup).

Are all the VMs failing, what OS are they?

Author

Commented:
So reinstall VMWare Tools?

I am wary of potential issues after the last fiasco!

Author

Commented:
Not all VM's are failing.

All Windows 2008 x64 R2 except for one, which is a Windows 2003 x86.

I have successfully backed up the Windows 2003 and also one Windows 2008 R2.
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
Now can you perform any other tasks on the machine, you don't get any error?

Another Task is already in Progress

Try Restarting the Network Management Agents on the ESX server first.

(if no success)

and then....

Try restarting VMware vCenter Server Services
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
Just hold fire, with VMware Tools at present.

Try Restarting the Network Management Agents on the ESX server first.

Author

Commented:
When you say other tasks, what kind of tasks do you mean?

Can I restart the Network Management Agents on Live systems (even though noone is using right now) with no ill effects?
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
adding a cdrom iso to the virtual machine.

Yes, you can restart Network Management Agents on the host, the host will disconnect from vCenter for a few moments, but the VMs will still function on the ESX host.

Author

Commented:
Hmm, see this in log - this was a manual snapshot I tried which failed:

May 18 19:37:05.229: vmx| SnapshotVMXTakeSnapshotComplete done with snapshot 'Test': 0
May 18 19:37:05.229: vmx| Msg_Reset:
May 18 19:37:05.229: vmx| [msg.checkpoint.save.fail2.std3] An error occurred while saving the snapshot:
May 18 19:37:05.229: vmx| The destination file system does not support large files.----------------------------------------
May 18 19:37:05.229: vmx| Vix: [6062 vmxCommands.c:2532]: VMAutomationCreateSnapshotCallback: Got CreateSnapshot callback, snapshotErr = The destination file system does not support large files (5:C), UID = 0
May 18 19:37:05.877: vcpu-0| HBACommon: First write on scsi0:0.fileName='/vmfs/volumes/4dc86dbf-b7058470-1d9b-001b217f910d/Saints DC/Saints DC.vmdk'

Author

Commented:
This disk is only 149GB with 107GB free though!
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
what's your block size of this datastore?

1M?

Author

Commented:
Correct
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
are the VMDKs over 256GB?

Author

Commented:
Thanks.

1. There is plenty of space on the datastore for snapshots - as I said, 149GB of which 107Gb is free

Block size is 1MB which allows up to 256GB - I am within these limits.

Should I still explore the option of relocating the snapshot files?

Author

Commented:
I dont even see how one could run into this scenario - if the datastore is formatted with 1MB block sie, VMWare will not even let you create a VMDK bigger than 256GB anyway.
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
First I would quickly restarting Network Management Agents on the console of the affected host, this is very quick, and a quick win, if this works.

and then, change location of snapshot file to another datastore.

Author

Commented:
Oh, can I also add this vital information.

I have successfully backed up this VM before using VDR, and very little has changed in terms of size growth etc.

So I dont think it is this!

I did notice though that one of the the VM's that fail have VMNAME-Snapshot#.vmsn in the datastore, with very small size (few KB's). Although the rest that fail do not have these.

Author

Commented:
How do I restart these Management Agents?

Also note that on one host, there are a mix of some that work and some that dont - surely if it was the host, none would work?
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
What can happen is snapshots can hang. (in the VM, and on the Host server), which gives the Another Task is already in Progress, which is also indicative of the -3941 error.

Only way to cure this hung snapshot is to restart network management agents.

Author

Commented:
Ok. How do I do this?
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
You'll need access to the ESXi host console.

You Hit F2, login if required.

Then select Restart Network Management Agents.

Author

Commented:
Ah...no access to physical host right now.

Can I do SSH?

Author

Commented:
Well I did /sbin/services.sh restart via SSH and it still doesnt work :(
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
services.sh restart
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
still not able to Snapshot?

Author

Commented:
correct
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
I would investigate moving the snapshot location.

Author

Commented:
Hi

Sorry for not coming back to you last night, my internet died and then one of the drives failed on the SAN!

Everything is going wrong!
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
Yeah, no issues, happens here as well sometimes.
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
oh dear, SAN drives failing doesn't sound too good. I hope all is well again.

Author

Commented:
Well the RAID volume with the bad drive is now very degraded, and Exchange is running a bit rough but surviving.

Will push on with Snapshot stuff this evening or tomorrow :)
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
oh, no worries, get your SAN issues resolved.

and make sure you have valid backups.

Author

Commented:
Well, we have current file level backups online, but in terms of VM level, thats where VDR comes in.

During the VDR backup process, when it creates a snapshot - does the snapshot get deleted afterwards from the default snapshot location?

Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
Yes, it's supposed to delete it.

Author

Commented:
Thanks.

In that case when the SAS array is healthy again I can try creating a seperate dedicated snapshot datastore for all VM snapshots, I guess it will only need to be as big as the biggest VMDK + x% for growth?
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
It depends on the rate of change of the writes and block changes in the VM, in the backup window.

So if you backups run 9.00am on exchange, probably the busiest time of the day, your Snapshot will be larger than if you run it at 3.00am in the morning.

I apologize if I've posted this before, for you to read. (I'm now losing track, of who, I write to with regard to snapshots)

and I also know, that you are not creating snapshots, but you shoudl be aware of what they do, and why they are required by Backup Applications. Undeleted Snapshots can be potentially very dangerous, if you are not aware that a VM is running on a snapshot disk. You can now setup Alarms using vCenter and Nagios to monitor snapshots! I've seen many large organisations VMs fail, because there Admins were not keeping any eye on Snapshots!

A snap shot is a way to preserve a point in time when the VM was running OK before making changes. A snapshot is NOT a way to get a static copy of a VM before making changes.  When you take a snapshot of a VM what happens is that a delta file gets created and the original VMDK file gets converted to a Read-Only file.  There is an active link between the original VMDK file and the new delta file.  Anything that gets written to the VM actually gets written to the delta file.   The correct way to use a snapshot is when you want to make some change to a VM like adding a new app or a patch; something that might damage the guest OS. After you apply the patch or make the change and it’s stable, you should really go into snapshot manager and delete the snapshot which will commit the changes to the original VM, delete the snap, and make the VMDK file RW. The official stance is that you really shouldn’t have more than one snap at a time and that you should not leave them out there for long periods of time. Adding more snaps and leaving them there a long time degrades the performance of the VM.  If the patch or whatever goes badly or for some reason you need to get back to the original unmodified VM, that’s possible as well.  

I highly recommend reading these 2 articles on snaps:

Understanding Snapshots - http://kb.vmware.com/kb/1015180
Snaphot Best Practices - http://kb.vmware.com/kb/1025279

Author

Commented:
Thanks - that was very good to read.

Author

Commented:
I'm finding mixed info on redirecting the default location of snapshot files.

So I need to powerdown the VM, edit the config file with the path.

Then:

Do I need to "unregister" and "reregister" the VM or can I boot it back up?

I must admit unreg/rereg sounds a bit risky considering the problems I have been having lately!

Also, does the new Snapshot Datastore need to be connected to the ESXi Server or VM?

Do I need to redirect the configuation file also?

Getting confused :s
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
The reason you need to Unregister from the vCenter is because vCenter locks and caches the VMX files, if you edit the VMX, the changes will not be saved, because the cached copy will be written back so you need to Unregister.

If you don't want tom Unregister/Re-register machine, you can always shutdown machine, and select Edit Settings of the VM, Options, Under Advanced, General, Configuration Parameters, and add a new ROW, and variable.

variable to add is workingDir
value is "<new_path_location>"

 Configuration Parameters for the VM
 Configuration Parameters for the VM
new datastore must be available to the host ESXi server. If it's available to the Server, it's available to the VM.



No need to redirect the config VMX file, just made the modifications in the VMX for the new snapshot location.

Author

Commented:
Thank you!

Author

Commented:
Well I did all that, and CDR still fails with the same error message :(

Author

Commented:
How can I confirm that the VM is using the new datastore for snapshots?

I kept hitting refresh while VDR was tring to run and no files appear in the new snapshot store before it fails.
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
Create a manual snapshot, and wait 1 minute and check the snapshot datastore, the snapshot should be present there.

Author

Commented:
Manual snapshot even with no memory and no quiecse ticked fails:

File <unspecified file name> is larger than the maximum size supported by Datastores<unspecified datastore>

The VM is as follows:

Datastores:
OS (1MB block) - 199.75GB, 184.43GB Free
SQL DB (1MB Block) -  255.75GB, 253.72GB Free
Sage: (1MB Block) - 4.75GB, 2.84Free Free

Snapshot Datastore: (2MB Block) 300GB

???
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
is the snapshot being created on the new Snapshot Source?

After the Snapshot is crearted it should show two datastores in the Summary for the VM.

If you added the new variable through the configuration parameters, this panel is experimental, and the workingDir variable may not be saved.

and therefore, you'll have to unregister machine, modify VM, and Register machine again.

You will not see the variable in the configuration panel. Only if you inspect the VMX and create a Snapshot.

Author

Commented:
The snapshot is not created, hence no files even being created.

Can you please explain how to unregister and re-register? The VM Article only explains how to register.

Also, do I have to reconfigure the VM (add disks etc?)
VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017
Commented:
This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION

Author

Commented:
Wow, that seems to have done it!

Upon looking at the VMX file the workingDir value was "." which explains why it didnt work. Looks like GUI is very experimental.

VDR is now running!

Snapshot datastore now had 6 vmdk's all the same size though, why is this? snapshot datastore
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
Glad we've got vDR working again for you.

I prefer to allocate a Snapshot LUN, which has plenty of space, for snapshots, if you have thin provisioned your Virtual Machine LUNs, you don't have to worry about running out of disk space.

The GUI is experimental. (we always edit the VMX directly, because that's the way we've always gone it, but I know there are plenty of "Children of the GUI" around that like WIndows and GUIs, don't like command prompts, command lines of vi! Call me and old Geek!

Change Block Tracking is enabled, which is how it maintains changes of the blocks on the disks for faster incrementals. Once the backup is finished just make sure they disappear. Don't be too worried, if they don't disappear immediately, but just keep an eye, between backups.

Author

Commented:
Cool.

Does VDR scheduling just pick a random time in the backup window to start?

Becuase I just edited the schedule of the VM backup jobs,and they all started at once (I have a seperate backup job for each VM currently. They all have slightly different schdules but most start at 7pm)

Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
Yes, vDR scheduling is a bit weird, it trigers when the data source is out of date.

So if they were all "out of date" sources, they would all trigger, but be careful with this, because at present, vDR does look at any performance of the datastore, and it can flatline a datastore with ease!

Vizionore/Veeam have got options to reduce the number of backups per lun to x, to stop this.

How we have configured clients sites, is to group services into unique backup jobs, and stagger them at different times, and use the option to kill a backup job if it overruns into another job.

As it's doing incrementals, after you first backup, jobs should complete quickly each day, because the delta between each backup should be quick.

So we specify different hourly windows for services, and most services are on different luns as well.

e.g. Different Exchange Stores on different LUNs, Different Exchange Servers different LUNs, SQL etc, and DCs on seperate LUNs, so if we have LUN failure on the SAN, we've spread the services, so we don't have a massive service outage.

Author

Commented:
Thanks, AGAIN
Unlock the solution to this question.
Join our community and discover your potential

Experts Exchange is the only place where you can interact directly with leading experts in the technology field. Become a member today and access the collective knowledge of thousands of technology experts.

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.