Link to home
Start Free TrialLog in
Avatar of jlablans
jlablans

asked on

Issue with VMWare ESXi snapshots

Hello,
I have an issue with VMWare ESXi snapshots, please see below for setup information history of the issue:

Very small customer site with 3x vm's running on a Dell PE2970.
Datastore using local SAS drives.
All of the customers data held within VM file server.
Original VM created via P2V in feb 09 with a couple of snapshots taken since then.
The customer moved loads of data around on the server and then created another snapshot resulting in a further snapshot file size of 225gb!
Situation has now surfaced where we didnt have enough free space to even start the servers and thus received the following message:
msg.hbacommon.outofspace: there is no more space for the redo log
We ran the delete all snapshot commands through the VI client to consolidate the snapshots, this took a long time.
Once this had finished we had no snapshots visible in the snapshot manager but did not have any more free space.
Browsing the datastore showed the snapshot delta files still present.
As the original file showed a modified date of today I edited the .vmx file to point to the original .vmdk instead of the latest snapshot.
Booted and it was still the original version from feb 09 so immediately shtdown and reverted back to the original .vmx file pointing to the latest snapshot.
I now receive the following error:
Cannot open the disk ... Reason: The parent virtual disk has been modified since the child was created.
I have found various references to resolving this issue by editing the parentCID but I am not sure how I can do this with an ESXi and no service console?
Unfortunately my Linux skills are somewhat lacking.
The VM and associated snapshots are also very large!

As you probably guessed this a production server and we need to get this backup up asap!
Any help much appreciated.

Many thanks in advance.
Jai



 
Avatar of ryder0707
ryder0707
Flag of Malaysia image

Avatar of jlablans
jlablans

ASKER

Thanks for the speedy response ryder0707.
I had not seen the redshift10 article but I have read similar scenarios.
The site is now shut until monday morning (weather permitting) so I cannot gain access to the physical machine to enable the "unsupported" service console.
Can I run all the commands that I will need to from there?
I read somewhere that this is a very limited service console.

I have the further problem that the .vmdk files are so large that I really need to edit them directly on the datastore.
Do you know of any programs that will allow me to do this?

Thanks again
Jai

Can I run all the commands that I will need to from there?
Yes all basic commands are there, you can use vi as the editor

Don't worry about vmdk file size, vmdk file itself is actually very small in size, its like a txt based config file or disc descriptor with info on which flat-vmdk file(the actual data) to use
Just make a backup/copy of each vmdk file you want to file before edit them in the console
for example, cp /vmfs/volume/your_datastore/your_vm/yourvm.vmdk /vmfs/volume/your_datastore/your_vm/yourvm.bck
Great, thanks for the further info.
I have been using the datastore browser to look at the files so I was not actually seeing everything correctly.
I have installed Veeam FastSCP which shows me all the files and also allows me the edit the descriptors directly.
I have backed up all of the these small .vmdk files and taken a note of the cids and parentcids.
found a break at between snapshot 01 and the original file.
Tried correcting this but receive the same error when I try to start the machine.
Will go over this again now.
Cheers
Avatar of jhyiesla
I don't use ESXi, so I am making an assumption that snapshots function the same as in ESX.  This doesn't really address your immediate problem, but perhaps something you can share with your client.

There appears to be a big misunderstanding about what snapshots are for.  They are NOT a static picture in time of a VM that you should put aside for the future, just in case.  The proper use of the snapshot is to put in a waypoint, if you will, to a point in time where the system is stable before making some change to it. Once that change is made and tested, the snap shot should be removed using the snap shot manager. Why?

Because when you take a snap shot you in effect cause the original VMDK file to become read only and create a delta vmdk file that any future changes are written to. The original VMDK file and the snap shot(s) file are essential in order for the VM to work properly. If you create multiple snap shots or leave them around for a long time, they just grow and grow in size and potentially will degrade the performance of the VM.  Then when you go to delete them, you may need to have excess space available to properly roll them back.. There are two ways to delete a snap shot with the manager and it all depends on where you position the "You are here" button.  If you really want to delete the snap shot because the change didn't work or cause issues you can truly delete  the snap shot and it will wipe out the delta file and make the original VMDK file read/write again.  However, if you want to keep the changes, you position the "you are here" button differently and when you delete the snap shot file, what you are really doing it rolling all of the changes that have happened to the VM since the snap was taken back into the original VMDK file.  This takes extra space while the rollup is happening.  This is another reason to not make multiple snaps or let them linger too long.
jhyiesla, thanks for the further info and clarification on snapshots.
I really didnt fully understand them until this issue had occurred to be honest.

I think I have now successfully repaired the cid chain and appear to be back to where I was at the start of this issue.
I dont have the snapshots appearing in the snapshot manager so I will have to try and merge with rcli or unsupported service console.
What amount of space do I need to merge the snapshots?
Double the size of of the delta + flat?
Can I merge the delta files one by one with rcli or service console?

Many thanks



I always use the snap shot manager, but I assume there are command line ways to do it as well.  Basically I believe that when you are rolling back snaps you have to have temporary space equal to the size of the VMDK file and the sanp you are rolling back.  If you are kind of low on space, you may not want to roll them all back at once, as this increases the space necessary to do the roll back. Rolling them back one at a time should hold down the temporary space needed a little.
What amount of space do I need to merge the snapshots?
See my reply on the following question

Double the size of of the delta + flat?
First it will merge all snapshots then commit to base vmdk
Example, snap1 = 1GB, snap2 = 2GB, snap3 = 3GB, if you delete all, snap3 merge to snap2, then snap2 merge to snap1, finally snap1 merge to base vmdk, so you need around 6GB free space

Can I merge the delta files one by one with rcli or service console?
Not clue if we can do this, let me know if you've discovered, good to know if this is possible
Maybe vmware tech guys have undocumented method but i dont know
Something else to think about.  Not sure what you've really done and so it's hard to know the viability of the snaps in the first place. I assume that they are still live.  One good way to tell is to look on the datastores and see if the date and time stamp is being updated. Something else to consider, which the other expert just brought up is that it may be time to involve VMware tech guys... especially if these VMs are critical to your operation.
Thanks again for your help so far!
Update:
As the server is production I had to get back online asap so I deleted another server on the host that we can live without for a while.
This enabled the main server to boot and get the users back online.
So I still dont have enough space to commit the snapshots back to the flat file, I thought the following might work:
Add new ESXi host to the network and a nas server.
Run VMWare Converter copy the original vm from the old host to the new (using datastore on nas box).
Boot vm on new host, wait for a few days then delete files on old host.
Move vm back to original host and boot.

Do you guys think this will work?
I had planned on doing the above this weekend so ran the converter but receive this error:
unable to obtain hardware information for the selected machine

I understand this is quite a generic error so I have attached the agent log.
All help is greatly appreciated!
Thanks
Jai






vmware-converter-agent-1.log
Add new ESXi host to the network and a nas server.
You might as well add the datastore from the nas box directly to existing esxi, You can have many datastores in the same esxi host
Try creating a new temporary VM but point to the existing vmdk, then run vmware converter again to convert the new created VM

I reread your problem again just noticed something important, the thing that concern me is your statement,
"Booted and it was still the original version from feb 09 so immediately shtdown and reverted back to the original .vmx file pointing to the latest snapshot."
There are cases even vmware support are unable to recover data from snapshots if the base vmdk has been modified, so if this is the case, most likely you need help from http://www.ontrackdatarecovery.com/vmware-recovery/
ASKER CERTIFIED SOLUTION
Avatar of ryder0707
ryder0707
Flag of Malaysia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
The original chain of vmdk's are ok as we have booted the server and it is running live now.
I will take a look through the video later today.
thanks again.
Jai
I am going to close this now as we are looking to rebuild the network from scratch to solve this and other issues.

Many thanks to both ryder0707 and jhyiesla for all the info and help on this, it has most educating and will certainly stop this issue occuring again for us.

Cheers