Link to home
Start Free TrialLog in
Avatar of DP230
DP230Flag for United Kingdom of Great Britain and Northern Ireland

asked on

Vmware disk need consolidated after failed to remove Veeam snapshot

Dear Experts, after Veeam failed to remove a snapshot of VM, vsphere said that its disks need to be consolidated. I let it run through the night but it seems did not work. The VM is still running on snapshot because I can see "...MX-000001.vmdk" in Settings.

I guess the reason is low of storage. I tried to delete some junk data on SAN but not sure if that is enough for a new snapshot that will be taken tonight.

Should I still follow?
1. Power off the VM
2. Take a snapshot
3. Delete all snapshots
3. Power on  the VM

Is there any consideration? when the snapshots are removed at step#3, we will have to wait disks to be consolidated. Am I right?

Many thanks!

User generated image
    User generated image
Avatar of Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)
Flag of United Kingdom of Great Britain and Northern Ireland image

Also check that Veeam does not have the parent disk open ?

You will not be able to Power On, until the snapshot has been removed.
Avatar of DP230

ASKER

Hi Andy, I'm not sure why but the vm has been still running on snapshot since yesterday.

Is there any problem with it? Should we let Veeam do its job again tonight or remove the snapshot manually? 
if Veeam failed to remove the snapshot once it is likely to fail again making the situation worse also the snapshot will be growing by the minute VM performance will be affected and there is a chance of snapshot corruption so you should not ignore it an deal with it!!
Avatar of DP230

ASKER

Okay so what should I do in this case? I just checked the veeam logs, the job completed with green check and said that the vm snapshot has stuck, will attempt to consolidate periodically. The VM has been running in production for 12 hours.

Do you think this script will work:https://pelegit.co.il/delete-stuck-veeam-snapshot-on-virtual-machine/
you've got two options power off VM and remove snapshot correctly or remove whilst powered on

check that parent disk is not attached to Veeam vm

don't use the consolidate option

take a new snapshot
wait 120 seconds
then select delete alll

wait and be patient
Avatar of DP230

ASKER

Hi, I check the Veeam VM and could not see any disk attached except itself.
I can see that you also suggest to remove the disks from (problematic) VM before take snapshot on other questions here. Can you kindly clarify? Should I do the same?
 
User generated imageUser generated image
If you have no Disks connected to Veeam VM then you can proceed
Avatar of DP230

ASKER

It has been run for half an hour and progress in 2%. How fast does it run? Can it run faster, I really want to finish it before Sunday's night.
How long is a piece of string.... 30 minutes is nothing.....my advice, go away and come back in 4-5 hours
.
it could take seconds, hours, minutes or days, the speed of merge is based on two things but it will complete, the important thing, is you do not mess with the process, stop, cancel, restart the host, stop the services, but just wait.... and be patient.

1. size of snapshot.

2. speed of datastore.

Please read me EE Article

HOW TO: VMware Snapshots :- Be Patient
Avatar of DP230

ASKER

hi, it seems like done but only 1 of 3 ddisks, the others are still Running ön snapshot. Should I repeat?

I repeated it. let's wait some time
that's rather odd, as all disks should be done at the same time.

have to checked to confirm the VM is actually running on them, see my article above to check how.

the screenshot above just shows a single disk of 500GB ?

the snapshot number has not increased, e.g. to -000002.vmdk
Avatar of DP230

ASKER

yes but its picture is veeam vm, not the problematic one. I checked in edit settings and only C drive is running on parent disks. It also has a consolidate warning on the screen.
Avatar of DP230

ASKER

User generated image
The old ones are still there, but I can see that the snapshot for ...MX.vmdk was deleted.

These are the one I just made:
User generated imageI also noticed that there is no Veeam Snapshot on Snapshot Manager
okay thats the Veeam VM - okay.

if you select DELETE ALL ?

do they disappear ?
Avatar of DP230

ASKER

Just before posting the update, I checked and did not see any snapshot in Snapshot manager, but now it should have because I created it. The task history showed that the remove process last night completed OK after 4 hours. Not sure why I still have 2 out of 3 disks running on snapshot. Now the new remove process is running on 33%

User generated image

I guess there is nothing much I can do when it deletes the process, so I will come back in few hours. what should I do after it finish?
1. Check Snapshot Manager for any snapshots listed.

2. Check folder for any snapshots (-00000x.vmdk)

3. Check the disk properties to check if the VM is writing to a snapshot file -00000x.vmdk
Avatar of DP230

ASKER

Hi,
1. No I see no snap in Snapshot manager
2. Yes it still have the delta files
3. Yes it still has 2 disks running on snapshot

What should we do?
take a new snapshot
wait 120 seconds
then select delete alll

wait and be patient
Avatar of DP230

ASKER

Is there any other option?

I did twice but nothing different 
Can you send me screenshots of the disk properties?

and a listing of all folder contents from SSH/console.
Avatar of DP230

ASKER

Hi, these are the info, (I covered the domain name).

Anything to to with Snapshot Hunter? I found some articles suggest to add registry key DisableAutoSnapshotConsolidation  in Veeam server

Should we try consolidate disk? in vCenter or ESXi?

User generated image
User generated image
User generated image
User generated image
Avatar of DP230

ASKER

To recap, it happened on 9 April when my backup completed OK but its said the VM snapshot has stuck with "operation timeout". This is a log file (history) in Veeam:
 
User generated image
This is a task console in vCenter
User generated image
That is the only one time consolidate disk was run automatically and it failed. After that we recreated snapshot twice and deleted ALL twice, but only 1 out of 3 disks is escaped from snapshot, remaining the other 2.

Currently I have to start the VM and running it from snapshots but not sure if it be OK?

Avatar of DP230

ASKER

Is there anyway we can check the creation date of snapshots file? I can see that the child disk is even larger than the base.

So if I understand correctly, we have 000001.vmdk is the base now, not the .vmdk (its snapshot might be deleted?)

Order is .vmdk => 000001.vmdk => 000002.vmdk => 000003.vmdk

What if we delete the .vmdk and consolidate the disks, so that they will commit to 000001.vmdk? Is it possible?

User generated image
User generated image
Yes, it does confirm the VM is running on a snapshot.

Please just check that Veeam does not have any of these disks attached.

Is there space on this dataastore ?
Avatar of DP230

ASKER

Hi, no Veeam VM did not have any of those disks attached.

We have about 1.9 TB in this SAN storage 2

I created the a folder in that VM's store and moved the MX_2.vmdk file to it, but it appeared a "flat" file with almost equal size of MX_2_000001.vmdk , so I moved it back
Okay, there is another process we can do to try and get rid of the snapshot, and that is to CLONE the entire VM.

The resulting VM (CLONE) will be without snapshots and then use the CLONE.

(so Power off the VM, CLONE it)

It seems that something has locked the parent VM disks, which refuses to merge the snapshots.
Avatar of DP230

ASKER

Yup but the clone will cost us about 5 TB whereas the datastore has only 1.9 TB.

We intend to map the ISCSI disk (from NAS synology device) to the ESXi host as new datastore and clone the VM to it.

What do you think?

Any datastore with sufficient space will do.

I'm wondering if this is a space issue it cannot merge, did you try with VM Powered Off ?

It should not matter if using ESXi or vCenter Server to complete the delete all of snapshots.
Avatar of DP230

ASKER

Yes the VM was turned off all the time I tried. ok but if we clone the VM, which disks will it use as base?

.vmdk or 00001.vmdk?

When the CLONE operation starts it will merge all the changes

0003.vmdk > 0002.vmdk > 0001.vmdk > parent.vmdk

So the final CLONE VM will just have a single parent.vmdk file with zero snapshots.

If the CLONE fails to complete or give an error there is something else which is wrong.
Avatar of DP230

ASKER

Hi, should we let the VM on while cloning it?

In that case, if we clone the VM in Monday and complete on Friday, will the data (when completed) will be merged or just the Monday's data?
To ensure the data is intact with no issues power off the VM.

and then CLONE.

You will know in a matter of minutes if there are issues with the VM, you'll get an error.
Avatar of DP230

ASKER

Yes i just tried to clone another, smaller VM to iSCSi LUN datastore, it took 40 mins to complete 160 GB of data (and the VM is still online).

The data during that 40 mins cannot be cloned. In case of 6 TB data, if I estimate correctly will take about 25 hours to complete ;(

otherwise you will need to reach out to VMware Support for assistance because they will have remote access to hands on access to the issue which is preventing the snapshot from merge
Hi,

Just a shot in the dark : you could check the vmdk's IDs
Sometimes it could all went wrong with backup or replication softwares.
You could check that for each virtual disk
For example for Disk 1, named disk1.vmdk, if you have 2 snapshots you will have these files in VM's folder (in ssh/console):
disk1.vmdk
disk1-000001.vmdk
disk1-000002.vmdk

Which are only descriptors.
If you read these text files (cat file_name), you'll see the ID of the vmdk related to that file, and its parent ID
In the example you should have :
disk1's ID = disk1-000001's parent ID
disk1-000001's ID = disk1-000002 parent ID

I hope you see the logic. Check if the chain seems good.
Sometimes when this is all messed up you have ID=ffffff which mean no ID, except maybe for the first file parent ID, because it does not have parent, that's normal.

If all your files are still there in VM folder you could probably get it back.
But yes thats could be a support call to VMware as Andrew has stated.

Also what is your Veeam version?
Avatar of DP230

ASKER

My Veeam version is 9.5.

Last Sunday, I turned off the VM and started the cloning but about 12 hours later, we have to stop the process because of users' screaming. Is there anyway to start the VM during the cloning process?
No, if the VM is powered off, and you start the CLONE, you cannot power back on.

I'm sure you'll understand if you try to CLONE whilst users are using the server and changing data, you may end up with an inconsistent state, and the risk of corrupt or lost data.
Hi,
Is there anyway to start the VM during the cloning process? 
No. You have to cancel/stop the clone first. Then you will have the option to start VM.
Veeam 9.5 should be fine, you could update to v10, but v9.5 it is not that old, and not specifically known for big issues of this type.
Avatar of DP230

ASKER

Hi, should we follow this article? https://www.experts-exchange.com/articles/29387/Veeam-Proxy-issue-Removing-Veeam-ghost-snapshots.html

1. Move all ctk files to a folder
2. Turn off the VM
3. Create a new VM's snapshot; then Delete All
4. Turn on the VM
5. Delete a folder of ctk file

Our VM is running on 33th snapshot :(((
Does the Veeam Backup VM, have the disk of this VM attached ?

I would still CLONE the VM.
Avatar of DP230

ASKER

Hi, I checked but no any disk attached to Veeam VM.
So you'll need to CLONE and create a new VM.
Avatar of DP230

ASKER

Hi, we just add 20 TB more to the data store. I just want to do the consolidate disk, at least one more time before cloning. How many space does it require for consolidate disk? that VM's size is about 7 TB
The issue is that you possibly have too many snapshots.

Do you get errors, when you try to delete the snapshot ?
Avatar of DP230

ASKER

Yes, in the Veeam console result, it said that "... has stuck VM snapshot, will attempt to consolidate periodically"

It also has "operation time out " in vCenter

Have you manually tried to delete the snapshot, I'm not talking about Veeam ?

1. Take a manual snapshot.

2. Wait 120 seconds.

3. select DELETE ALL and Wait and be patient, it will take a long time.

If this does not work, you have got no option but to try CLONE, and re-reading all the above

1. Power off the VM.

2. CLONE.

3. WAIT.

Yes it's going to take a long time, and the VM and service is going to be down, so schedule out of hours.
Avatar of DP230

ASKER

ok, how can I checck the consolidate and cloning in ESXi CLI? It has run from Friday night.
You can either just use the CLONE from the GUI which is much easier and less work, or you can use CLONE from ESXI CLI (more work).

I would recommend, just powering off the VM, and Right CLICK and CLONE.

Do you have vCenter Server ?
Avatar of DP230

ASKER

Yes I have vCenter but really want to try the consolidate before cloning. Its status is 1% now :((
Perhaps I will let it run for few more days.
If you have Selected DELETE ALL you cannot cancel now....
Avatar of DP230

ASKER

I clicked on consolidate actually, and stopped it when the San storage warns about increased size of 9TB. I checked but it still has 33 snapshots
you risk corruption to the virtual machine be warned!

you cannot avoid using storage to fix this snapshot mess!

and with every hour the snapshot will get larger causing more issues with performance of the Guest VM

what is the total size of snapshots now and how many snapshots now?

what is this VM? Is it Exchange ?

Have a read of my EE Article

HOW TO: VMware Snapshots :- Be Patient
Avatar of DP230

ASKER

@Andrew, yes that is an Exchange 2016 server with 5 EDB, we saw lots of delta files in datastore. here is a list of delta and vmdk files in that VM: (the red ones are the parent disks)

User generated image

User generated image

Yes the total size of those files is about 9 TB as expected. I also saw some "sesparse" files in datastore (SSH access) but could not see them in vCenter/ESXi GUI. In the snapshot management, I saw only 1 from Veeam:

User generated image

I read your articles many times, but could not calculate the exact time it needed to clone (or consolidate?). That VM serves about 2500 users so it's difficult for us to schedule the downtime :((

and that VM has been running for few days without backup (because of consolidate disks process). Is it possible to run a full backup tonight?
ASKER CERTIFIED SOLUTION
Avatar of Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of DP230

ASKER

About the first 2 options, which one is your recommendation based on your experience? I'd like the one which has less down time please.

About the 3rd option, if I restore a whole VM, will the disks be consolidated? Because the last backup run last week and at that time, the VM already had those delta files.

About the first 2 options, which one is your recommendation based on your experience? I'd like the one which has less down time please.

About the 3rd option, if I restore a whole VM, will the disks be consolidated? Because the last backup run last week and at that time, the VM already had those delta files.

None of these options have less downtime!

You should have been regularly checking backups to avoid 33 snapshots on a VM! That's at least 1 month of not checking!

I would try 1`, and check if it works, of if you are concerned, I would move onto 2.

Both are going to require

1. Emergency Downtime.

2. Power Off server.

3. Prepare a new datastore of correct size.

4. CLONE VM.

5. Wait and Be Patient.
Avatar of DP230

ASKER

"Process of CLONING will create a new cloned VM with no snapshots, but it will require the same size as the current VM, without snapshots "

Should I increase the capacity of VM_1 disk on current VM before cloning? Currently it is 6TB disk, the occupied is only more than 4TB. I'm afraid after the cloning, the disk will be full of 9 TB. Is that correct?

User generated image
Should I increase the capacity of VM_1 disk on current VM before cloning? Currently it is 6TB disk, the occupied is only more than 4TB. I'm afraid after the cloning, the disk will be full of 9 TB. Is that correct?

No, GuestOS disk size has got nothing to do with snapshots.

Just ensure that the ESXi datastore you use for the CLONE has enough space to support the VM, check the disk without snapshots.
Avatar of DP230

ASKER

Ok, based on my experiments with smaller vm, it will take about 12-16 hours to complete the cloning. Wish me luck to night 🙈
Should be fine, shutdown so no one can access so clone is identical. Check your mailbox and record last email received.

If you need to use the same MAC Address, then record the existing MAC Address, and manually change the MAC address later.

CLONE, then after 12-16 hours, change MAC Address of CLONE, and startup VM, but at this time, I would disable networking, and check over VM, before it enters production again.
Avatar of DP230

ASKER

I'm not sure about the MAC address, in which case I need to change it as the same of an old one?

Should I do the clone on vCenter or ESXi host?
only if you have a requirement to use the old MAC Address because a CLONE will change it in the VM created

if unsure make it the same
oh dear! too many snapshots!

you may not need to use the shell and execute some commands ?

Happy to do this ?
Avatar of DP230

ASKER

Yes, I can do commands. Actually I have to turn that VM off using esxcli kill

When I check the VMtasj in ESXi, it said that the consolidate disk is running but I did not see it in GUI.

I rebooted the host and run the clone again. In case it happened again, what should I do? Is it better if we change to the plan 1 - "Delete All Snaphots"?
The reason CLONE would have been greyed out, is if there was a task in progress.

You need to make a decision

1. DELETE ALL SNAPSHOTS

or

2. CLONE.

I would recommend CLONE now.
Avatar of DP230

ASKER

The clone is running and I think it is slower than expected.

If I have a full backup version of an old VM, then after cloning - can Veeam do the incremental in the new VM?
A new VM, needs a new full backup, it's a different object.
Avatar of DP230

ASKER

Hi, the clone completed successfully after more than 23 hours. I'm going to backup this new one with Veeam but to prevent this problem happens in the future, what options would you recommend to to in vCenter, ESXi or Veeam?

In the new VM's datastore; I saw the vmdk files of MX_3, MX_5 and MX_7; -ctk and -flat files, corresponding with these files. Are they ok? Not sure why does it name my disks like these, because the original ones are MX, MX_1 and MX_2? And where is _4 and _6? 
Set An alarm to warn you of snapshots and daily checks after backups!

In the new VM's datastore; I saw the vmdk files of MX_3, MX_5 and MX_7; -ctk and -flat files, corresponding with these files. Are they ok? Not sure why does it name my disks like these, because the original ones are MX, MX_1 and MX_2? And where is _4 and _6?

that is a little odd, check the disks are all present in the VM settings.
Avatar of DP230

ASKER

Yes I checked and all 3 disks are enough, but they are named 3,5,7 respectively but I guess they should not have  any problem?

that's very odd. If it causes concern, and you would like to appear cosmetically correct, you could try migrate to another datastore using Storage Migration may rename them correctly.