Avatar of hongedit
hongedit
Flag for United Kingdom of Great Britain and Northern Ireland asked on

VDR Failing to backup

Hi

I am having an issue with VDR. On several of my VM's, the backup fails with:

Failed to create snapshot for Saints DC, error -3941 ( create snapshot failed).

Points to note:

1. VDR has been rebooted several times
2. Disks have been formatted with 1MB blocks, but backups fail on VM/Disks with less than 256GB
3. Trying to snapshot a problem VM manually without memory but with quiesce ticked also fails
4. Backups work on "simple" VM's - standalone servers with no additional disks etc
5. I dont know if this is coincidence but all the VM's it is failing on have multiple disks. Even if I select only to backup the OS disk.
6. Backups fails on locally mounted disk or network share
VMware

Avatar of undefined
Last Comment
hongedit

8/22/2022 - Mon
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

Okay, the issue is the Snapshot failure. If the snapshot cannot proceed correctly, the vDR Backup job will fail.
hongedit

ASKER
Ok. So what should I be looking at?
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

As you have confirmed by trying to snapshot the machine with Memory Unticked, but Quiesce ticked, is the snapshot function with vDR triggers. If the snapshot cannot quiecse the virtual machine, the vDR job will fail.

The issue may reside with VMware Tools in the virtual machine.
Your help has saved me hundreds of hours of internet surfing.
fblack61
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

Check the virtual machine event log, do you have any errors, around the time of the error (backup).

Are all the VMs failing, what OS are they?
hongedit

ASKER
So reinstall VMWare Tools?

I am wary of potential issues after the last fiasco!
hongedit

ASKER
Not all VM's are failing.

All Windows 2008 x64 R2 except for one, which is a Windows 2003 x86.

I have successfully backed up the Windows 2003 and also one Windows 2008 R2.
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

Now can you perform any other tasks on the machine, you don't get any error?

Another Task is already in Progress

Try Restarting the Network Management Agents on the ESX server first.

(if no success)

and then....

Try restarting VMware vCenter Server Services
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

Just hold fire, with VMware Tools at present.

Try Restarting the Network Management Agents on the ESX server first.
hongedit

ASKER
When you say other tasks, what kind of tasks do you mean?

Can I restart the Network Management Agents on Live systems (even though noone is using right now) with no ill effects?
This is the best money I have ever spent. I cannot not tell you how many times these folks have saved my bacon. I learn so much from the contributors.
rwheeler23
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

adding a cdrom iso to the virtual machine.

Yes, you can restart Network Management Agents on the host, the host will disconnect from vCenter for a few moments, but the VMs will still function on the ESX host.
hongedit

ASKER
Hmm, see this in log - this was a manual snapshot I tried which failed:

May 18 19:37:05.229: vmx| SnapshotVMXTakeSnapshotComplete done with snapshot 'Test': 0
May 18 19:37:05.229: vmx| Msg_Reset:
May 18 19:37:05.229: vmx| [msg.checkpoint.save.fail2.std3] An error occurred while saving the snapshot:
May 18 19:37:05.229: vmx| The destination file system does not support large files.----------------------------------------
May 18 19:37:05.229: vmx| Vix: [6062 vmxCommands.c:2532]: VMAutomationCreateSnapshotCallback: Got CreateSnapshot callback, snapshotErr = The destination file system does not support large files (5:C), UID = 0
May 18 19:37:05.877: vcpu-0| HBACommon: First write on scsi0:0.fileName='/vmfs/volumes/4dc86dbf-b7058470-1d9b-001b217f910d/Saints DC/Saints DC.vmdk'
hongedit

ASKER
This disk is only 149GB with 107GB free though!
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

what's your block size of this datastore?

1M?
hongedit

ASKER
Correct
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

I started with Experts Exchange in 2004 and it's been a mainstay of my professional computing life since. It helped me launch a career as a programmer / Oracle data analyst
William Peck
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

are the VMDKs over 256GB?
hongedit

ASKER
Thanks.

1. There is plenty of space on the datastore for snapshots - as I said, 149GB of which 107Gb is free

Block size is 1MB which allows up to 256GB - I am within these limits.

Should I still explore the option of relocating the snapshot files?
hongedit

ASKER
I dont even see how one could run into this scenario - if the datastore is formatted with 1MB block sie, VMWare will not even let you create a VMDK bigger than 256GB anyway.
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

First I would quickly restarting Network Management Agents on the console of the affected host, this is very quick, and a quick win, if this works.

and then, change location of snapshot file to another datastore.
hongedit

ASKER
Oh, can I also add this vital information.

I have successfully backed up this VM before using VDR, and very little has changed in terms of size growth etc.

So I dont think it is this!

I did notice though that one of the the VM's that fail have VMNAME-Snapshot#.vmsn in the datastore, with very small size (few KB's). Although the rest that fail do not have these.
hongedit

ASKER
How do I restart these Management Agents?

Also note that on one host, there are a mix of some that work and some that dont - surely if it was the host, none would work?
All of life is about relationships, and EE has made a viirtual community a real community. It lifts everyone's boat
William Peck
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

What can happen is snapshots can hang. (in the VM, and on the Host server), which gives the Another Task is already in Progress, which is also indicative of the -3941 error.

Only way to cure this hung snapshot is to restart network management agents.
hongedit

ASKER
Ok. How do I do this?
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

You'll need access to the ESXi host console.

You Hit F2, login if required.

Then select Restart Network Management Agents.
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
hongedit

ASKER
Ah...no access to physical host right now.

Can I do SSH?
hongedit

ASKER
Well I did /sbin/services.sh restart via SSH and it still doesnt work :(
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

services.sh restart
Experts Exchange has (a) saved my job multiple times, (b) saved me hours, days, and even weeks of work, and often (c) makes me look like a superhero! This place is MAGIC!
Walt Forbes
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

still not able to Snapshot?
hongedit

ASKER
correct
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

I would investigate moving the snapshot location.
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
hongedit

ASKER
Hi

Sorry for not coming back to you last night, my internet died and then one of the drives failed on the SAN!

Everything is going wrong!
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

Yeah, no issues, happens here as well sometimes.
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

oh dear, SAN drives failing doesn't sound too good. I hope all is well again.
Experts Exchange is like having an extremely knowledgeable team sitting and waiting for your call. Couldn't do my job half as well as I do without it!
James Murphy
hongedit

ASKER
Well the RAID volume with the bad drive is now very degraded, and Exchange is running a bit rough but surviving.

Will push on with Snapshot stuff this evening or tomorrow :)
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

oh, no worries, get your SAN issues resolved.

and make sure you have valid backups.
hongedit

ASKER
Well, we have current file level backups online, but in terms of VM level, thats where VDR comes in.

During the VDR backup process, when it creates a snapshot - does the snapshot get deleted afterwards from the default snapshot location?

⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

Yes, it's supposed to delete it.
hongedit

ASKER
Thanks.

In that case when the SAS array is healthy again I can try creating a seperate dedicated snapshot datastore for all VM snapshots, I guess it will only need to be as big as the biggest VMDK + x% for growth?
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

It depends on the rate of change of the writes and block changes in the VM, in the backup window.

So if you backups run 9.00am on exchange, probably the busiest time of the day, your Snapshot will be larger than if you run it at 3.00am in the morning.

I apologize if I've posted this before, for you to read. (I'm now losing track, of who, I write to with regard to snapshots)

and I also know, that you are not creating snapshots, but you shoudl be aware of what they do, and why they are required by Backup Applications. Undeleted Snapshots can be potentially very dangerous, if you are not aware that a VM is running on a snapshot disk. You can now setup Alarms using vCenter and Nagios to monitor snapshots! I've seen many large organisations VMs fail, because there Admins were not keeping any eye on Snapshots!

A snap shot is a way to preserve a point in time when the VM was running OK before making changes. A snapshot is NOT a way to get a static copy of a VM before making changes.  When you take a snapshot of a VM what happens is that a delta file gets created and the original VMDK file gets converted to a Read-Only file.  There is an active link between the original VMDK file and the new delta file.  Anything that gets written to the VM actually gets written to the delta file.   The correct way to use a snapshot is when you want to make some change to a VM like adding a new app or a patch; something that might damage the guest OS. After you apply the patch or make the change and it’s stable, you should really go into snapshot manager and delete the snapshot which will commit the changes to the original VM, delete the snap, and make the VMDK file RW. The official stance is that you really shouldn’t have more than one snap at a time and that you should not leave them out there for long periods of time. Adding more snaps and leaving them there a long time degrades the performance of the VM.  If the patch or whatever goes badly or for some reason you need to get back to the original unmodified VM, that’s possible as well.  

I highly recommend reading these 2 articles on snaps:

Understanding Snapshots - http://kb.vmware.com/kb/1015180
Snaphot Best Practices - http://kb.vmware.com/kb/1025279

Your help has saved me hundreds of hours of internet surfing.
fblack61
hongedit

ASKER
Thanks - that was very good to read.

hongedit

ASKER
I'm finding mixed info on redirecting the default location of snapshot files.

So I need to powerdown the VM, edit the config file with the path.

Then:

Do I need to "unregister" and "reregister" the VM or can I boot it back up?

I must admit unreg/rereg sounds a bit risky considering the problems I have been having lately!

Also, does the new Snapshot Datastore need to be connected to the ESXi Server or VM?

Do I need to redirect the configuation file also?

Getting confused :s
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

The reason you need to Unregister from the vCenter is because vCenter locks and caches the VMX files, if you edit the VMX, the changes will not be saved, because the cached copy will be written back so you need to Unregister.

If you don't want tom Unregister/Re-register machine, you can always shutdown machine, and select Edit Settings of the VM, Options, Under Advanced, General, Configuration Parameters, and add a new ROW, and variable.

variable to add is workingDir
value is "<new_path_location>"

 Configuration Parameters for the VM
 Configuration Parameters for the VM
new datastore must be available to the host ESXi server. If it's available to the Server, it's available to the VM.



No need to redirect the config VMX file, just made the modifications in the VMX for the new snapshot location.
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
hongedit

ASKER
Thank you!
hongedit

ASKER
Well I did all that, and CDR still fails with the same error message :(
hongedit

ASKER
How can I confirm that the VM is using the new datastore for snapshots?

I kept hitting refresh while VDR was tring to run and no files appear in the new snapshot store before it fails.
This is the best money I have ever spent. I cannot not tell you how many times these folks have saved my bacon. I learn so much from the contributors.
rwheeler23
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

Create a manual snapshot, and wait 1 minute and check the snapshot datastore, the snapshot should be present there.
hongedit

ASKER
Manual snapshot even with no memory and no quiecse ticked fails:

File <unspecified file name> is larger than the maximum size supported by Datastores<unspecified datastore>

The VM is as follows:

Datastores:
OS (1MB block) - 199.75GB, 184.43GB Free
SQL DB (1MB Block) -  255.75GB, 253.72GB Free
Sage: (1MB Block) - 4.75GB, 2.84Free Free

Snapshot Datastore: (2MB Block) 300GB

???
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

is the snapshot being created on the new Snapshot Source?

After the Snapshot is crearted it should show two datastores in the Summary for the VM.

If you added the new variable through the configuration parameters, this panel is experimental, and the workingDir variable may not be saved.

and therefore, you'll have to unregister machine, modify VM, and Register machine again.

You will not see the variable in the configuration panel. Only if you inspect the VMX and create a Snapshot.
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
hongedit

ASKER
The snapshot is not created, hence no files even being created.

Can you please explain how to unregister and re-register? The VM Article only explains how to register.

Also, do I have to reconfigure the VM (add disks etc?)
ASKER CERTIFIED SOLUTION
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
hongedit

ASKER
Wow, that seems to have done it!

Upon looking at the VMX file the workingDir value was "." which explains why it didnt work. Looks like GUI is very experimental.

VDR is now running!

Snapshot datastore now had 6 vmdk's all the same size though, why is this? snapshot datastore
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

Glad we've got vDR working again for you.

I prefer to allocate a Snapshot LUN, which has plenty of space, for snapshots, if you have thin provisioned your Virtual Machine LUNs, you don't have to worry about running out of disk space.

The GUI is experimental. (we always edit the VMX directly, because that's the way we've always gone it, but I know there are plenty of "Children of the GUI" around that like WIndows and GUIs, don't like command prompts, command lines of vi! Call me and old Geek!

Change Block Tracking is enabled, which is how it maintains changes of the blocks on the disks for faster incrementals. Once the backup is finished just make sure they disappear. Don't be too worried, if they don't disappear immediately, but just keep an eye, between backups.
I started with Experts Exchange in 2004 and it's been a mainstay of my professional computing life since. It helped me launch a career as a programmer / Oracle data analyst
William Peck
hongedit

ASKER
Cool.

Does VDR scheduling just pick a random time in the backup window to start?

Becuase I just edited the schedule of the VM backup jobs,and they all started at once (I have a seperate backup job for each VM currently. They all have slightly different schdules but most start at 7pm)

Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

Yes, vDR scheduling is a bit weird, it trigers when the data source is out of date.

So if they were all "out of date" sources, they would all trigger, but be careful with this, because at present, vDR does look at any performance of the datastore, and it can flatline a datastore with ease!

Vizionore/Veeam have got options to reduce the number of backups per lun to x, to stop this.

How we have configured clients sites, is to group services into unique backup jobs, and stagger them at different times, and use the option to kill a backup job if it overruns into another job.

As it's doing incrementals, after you first backup, jobs should complete quickly each day, because the delta between each backup should be quick.

So we specify different hourly windows for services, and most services are on different luns as well.

e.g. Different Exchange Stores on different LUNs, Different Exchange Servers different LUNs, SQL etc, and DCs on seperate LUNs, so if we have LUN failure on the SAN, we've spread the services, so we don't have a massive service outage.
hongedit

ASKER
Thanks, AGAIN
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.