zen shaw
asked on
Recreating vmdk snapshots broken chain
Hi All,
I had a server crash with the RAID controller unable to detect the hard drive pair. To recover I had to get into the BIOS and change the settings so that the disk are detected again.
After the reboot the disk were detected but the vmdks were disconnected from the virtual machines in the inventory.
I attached the vmdk back to Windows 2008 server and restarted the VM and it worked fine. However, I had another VM running Windows 2000 with two snapshots. When I try to restart the VM with only flat file it works fine but when I try to restart with the snapshot 0002, it throws an error.
I assume that the snapshot chain has broken between the snapshots and the base vmdk file.
How do I recover / rebuild the snapshot chain (preferably without the need of downloading the vmdks and manipulating using Workstation or vConverter)
a. How do I check / change the CID for the affected VM / VMDKs
Another strange problem occurred when I tried to remove the affected VM from the inventory and within the datastore tired to rename its folder name from ABC to ABC_ORIGINAL .... instead of renaming the directory, vmware automatically started moving files from ABC to ABC_ORIGINAL ... on the client it displayed a task MOVE FILE with cancel greyed out ...
Even this task failed after 3% .... so I don't know what is the state of my VM now?
Could anyone help with it please?
Thanks
I had a server crash with the RAID controller unable to detect the hard drive pair. To recover I had to get into the BIOS and change the settings so that the disk are detected again.
After the reboot the disk were detected but the vmdks were disconnected from the virtual machines in the inventory.
I attached the vmdk back to Windows 2008 server and restarted the VM and it worked fine. However, I had another VM running Windows 2000 with two snapshots. When I try to restart the VM with only flat file it works fine but when I try to restart with the snapshot 0002, it throws an error.
I assume that the snapshot chain has broken between the snapshots and the base vmdk file.
How do I recover / rebuild the snapshot chain (preferably without the need of downloading the vmdks and manipulating using Workstation or vConverter)
a. How do I check / change the CID for the affected VM / VMDKs
Another strange problem occurred when I tried to remove the affected VM from the inventory and within the datastore tired to rename its folder name from ABC to ABC_ORIGINAL .... instead of renaming the directory, vmware automatically started moving files from ABC to ABC_ORIGINAL ... on the client it displayed a task MOVE FILE with cancel greyed out ...
Even this task failed after 3% .... so I don't know what is the state of my VM now?
Could anyone help with it please?
Thanks
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
No screen attached ?
download via datastore browser? if so the datastore is corrupted and so is the file that sits on the datastore.
If that is the case, I would *BACKUP NOW* ALL your VMs.
Check the hardware and RAID, and disks, and erase the current datastore, RAID array, and re-create.
I am not sure if the delta are corrupted but when I tried to download 0002 ... if failed to download to local machine.
download via datastore browser? if so the datastore is corrupted and so is the file that sits on the datastore.
If that is the case, I would *BACKUP NOW* ALL your VMs.
Check the hardware and RAID, and disks, and erase the current datastore, RAID array, and re-create.
ASKER
Hi,
I am attaching the descriptors ... you can have a look and tell me the problem:
I assumed the chain would be
C drive Docobo_BE.vmdk------>Docob o_BE_00000 01.vmdk--- ---->DOCOB O_BE_00000 02.vmdk
E drive Docobo_BE_1.vmdk------>Doc obo_BE_1_0 000001.vmd k------->D OCOBO_BE_1 _0000002.v mdk
But based on the descriptors, for C drive, there is the base descriptor file missing - How can I generate this file?
For D: If we trace the CID and Parent CID, it seems the order of the chain is
E drive Docobo_BE_1.vmdk------>Doc obo_BE_1_0 000002.vmd k------->D OCOBO_BE_1 _0000001.v mdk
Could it be possible that the snapshots have the order change maybe by deleting or consolidating ....
How could I rebuild the chain ... would appreciate if you could look at the CID and Parent CID and suggest me a solution.
Thanks
I am attaching the descriptors ... you can have a look and tell me the problem:
I assumed the chain would be
C drive Docobo_BE.vmdk------>Docob
E drive Docobo_BE_1.vmdk------>Doc
But based on the descriptors, for C drive, there is the base descriptor file missing - How can I generate this file?
For D: If we trace the CID and Parent CID, it seems the order of the chain is
E drive Docobo_BE_1.vmdk------>Doc
Could it be possible that the snapshots have the order change maybe by deleting or consolidating ....
How could I rebuild the chain ... would appreciate if you could look at the CID and Parent CID and suggest me a solution.
Thanks
ASKER
Files
vmdk-descriptors.zip
vmdk-descriptors.zip
can I just query something, you refer to C: D: and E:
but there are ONLY two virtual disks, with two snapshots ?
can I also have the exact error message, when you tried to start the VM, or add the snapshot 0002 disk.
and was the error the same for both disks ?
Okay, I can see that you are missing a descriptor file for
DOCOBO_BE.vmdk
the first disk......
this is also a common issue, that it disappears...
so, you will need this article also
Recreating a missing virtual disk (VMDK) descriptor file for delta disks (1026353)
Recreating a missing virtual machine disk descriptor file (1002511)
but there are ONLY two virtual disks, with two snapshots ?
can I also have the exact error message, when you tried to start the VM, or add the snapshot 0002 disk.
and was the error the same for both disks ?
Okay, I can see that you are missing a descriptor file for
DOCOBO_BE.vmdk
the first disk......
this is also a common issue, that it disappears...
so, you will need this article also
Recreating a missing virtual disk (VMDK) descriptor file for delta disks (1026353)
Recreating a missing virtual machine disk descriptor file (1002511)
ASKER
Hi Andrew,
There error I had received was NO OPERATING SYSTEM
Sorry for the confusion: There are only two drive C: and E: --- No D:
I am attaching the original vmx file also in which for the C: drive .... the current vmdk is
scsi0:0.present = "true"
scsi0:0.fileName = "DOCOBO_BE-000001.vmdk"
scsi0:0.deviceType = "scsi-hardDisk"
Based on the .vmx ... I am assuming that the .00002 vmdk was never used. Thus, I will have a missing base vmdk descriptor of the C: drive ....
DOCOBO-BE-Orig.vmx.txt
There error I had received was NO OPERATING SYSTEM
Sorry for the confusion: There are only two drive C: and E: --- No D:
I am attaching the original vmx file also in which for the C: drive .... the current vmdk is
scsi0:0.present = "true"
scsi0:0.fileName = "DOCOBO_BE-000001.vmdk"
scsi0:0.deviceType = "scsi-hardDisk"
Based on the .vmx ... I am assuming that the .00002 vmdk was never used. Thus, I will have a missing base vmdk descriptor of the C: drive ....
DOCOBO-BE-Orig.vmx.txt
Okay, so the virtual machine actually powered on with no VM error ? and the BIOS reported ?
NO OPERATING SYSTEM ?
and this was when you added the 0002 snapshot, connecting the first VMDK only without the snapshot BOOTED an OS?
If that is the original file, the 2nd snapshot was *NOT IN USE* at the time of crash!
Just add the 0001 snapshot to the VM and power on!
NO OPERATING SYSTEM ?
and this was when you added the 0002 snapshot, connecting the first VMDK only without the snapshot BOOTED an OS?
If that is the original file, the 2nd snapshot was *NOT IN USE* at the time of crash!
Just add the 0001 snapshot to the VM and power on!
ASKER
Yes, I am recreating the C: drive base descriptor file as that file is missing .... please refer my attached screenshot earlier....
I am going to create that base descriptor and point the 00001 descriptor to base file (newly created) and start the server.
I'll skip 0002 on both C and E drives.....
Am I doing it right?
I am going to create that base descriptor and point the 00001 descriptor to base file (newly created) and start the server.
I'll skip 0002 on both C and E drives.....
Am I doing it right?
Yes, that's correct.
So all that's wrong, is the missing descriptor.
If after adding the 0001 snapshot, the OS does not boot or is corrupted, the 0001 snapshot is corrupted and you will have to discard all snapshots, and just use the parent.
So all that's wrong, is the missing descriptor.
If after adding the 0001 snapshot, the OS does not boot or is corrupted, the 0001 snapshot is corrupted and you will have to discard all snapshots, and just use the parent.
ASKER
I corrected the chain as BASE ---> 00002 ----> 00001 and I am sure this is the right sequence. However, when I start the VM, I get NO OPERATING SYSTEM Detected at Bios....
When I go to BIOS .... I do not see any disk available under Primary / Secondary slave.
Note: The OS on C is windows 2000
Any idea?
When I go to BIOS .... I do not see any disk available under Primary / Secondary slave.
Note: The OS on C is windows 2000
Any idea?
discard the second snapshot.....0002, it's not required, it's not been in use....
the chain is 00002 ----- 00001 ------ BASE
but from your original VMX, 00002 was not in use so do not use.
the chain is 00002 ----- 00001 ------ BASE
but from your original VMX, 00002 was not in use so do not use.
ASKER
1. I recreated the chain as 00001 -----> 00002 ---> Base
2. Rebooted the machine and change the boot sequence to hard disk - no luck - cannot detect OS
3. Changed the Controller from Buslogic to Lsilogic - Gave a warning saying the drives were created for BusLogic, so cancelled the conversion and click No instead of Yes and continue
4. Create a new VM and attached the two VMDKs pointing to 0001 for both drives.
5. System booted correctly
6. Checked the eventlog and it showed 2012 entries.
I definitely changed the vmx entry to 0001 ...
Has my chain worked or its defaulting to the base file not allowing the changes done to the snapshots?
2. Rebooted the machine and change the boot sequence to hard disk - no luck - cannot detect OS
3. Changed the Controller from Buslogic to Lsilogic - Gave a warning saying the drives were created for BusLogic, so cancelled the conversion and click No instead of Yes and continue
4. Create a new VM and attached the two VMDKs pointing to 0001 for both drives.
5. System booted correctly
6. Checked the eventlog and it showed 2012 entries.
I definitely changed the vmx entry to 0001 ...
Has my chain worked or its defaulting to the base file not allowing the changes done to the snapshots?
When you power on the VM, it checks the snapshot chain is correct, or it will give an error message and HALT!
It would seem the 00002 snapshot is corrupted, and when this is added, it causes the issue, which is normal, of corrupt snapshot disks.
If they are all working, get RID of the snapshot state!
If the current disk, is running on a snapshot 0001 and parent disk (base), this is all you can do now, and then merge DELETE ALL the snapshot to ensure the changes are committed to the parent disk (base), and the snapshot is deleted and gone!
It would seem the 00002 snapshot is corrupted, and when this is added, it causes the issue, which is normal, of corrupt snapshot disks.
If they are all working, get RID of the snapshot state!
If the current disk, is running on a snapshot 0001 and parent disk (base), this is all you can do now, and then merge DELETE ALL the snapshot to ensure the changes are committed to the parent disk (base), and the snapshot is deleted and gone!
ASKER
So are you saying that the 0002 was never used? And the chain was 0001 ---> Base?
Is that a conclusion based on the original vmx file?
In that case, are you suggesting me to have 0001 ---> Base for both C: and E: and reboot?
how do I get the RID?
Is that a conclusion based on the original vmx file?
In that case, are you suggesting me to have 0001 ---> Base for both C: and E: and reboot?
how do I get the RID?
Do you have the original, unmodified VMX file?
Yes or No ?
does the VMX file include any reference to the 2nd snapshot file ? (-00002.vmdk)
Yes or No ?
Did you know that this VM was running on a snapshot ?
Yes or No ?
does the VMX file include any reference to the 2nd snapshot file ? (-00002.vmdk)
Yes or No ?
Did you know that this VM was running on a snapshot ?
ASKER
Do you have the original, unmodified VMX file?
I was passed a VMX file saying that was the original, which I forwarded to you. I assume the engineer before me did not change an thing in it and it is indeed the original.
does the VMX file include any reference to the 2nd snapshot file ? (-00002.vmdk)
Assuming the above is true- No there is no reference.
Did you know that this VM was running on a snapshot ?
I don't know - I am told that there were no snapshots visible in the Snapshot manager earlier.
I was passed a VMX file saying that was the original, which I forwarded to you. I assume the engineer before me did not change an thing in it and it is indeed the original.
does the VMX file include any reference to the 2nd snapshot file ? (-00002.vmdk)
Assuming the above is true- No there is no reference.
Did you know that this VM was running on a snapshot ?
I don't know - I am told that there were no snapshots visible in the Snapshot manager earlier.
Okay, assuming the VMX file DID NOT include any reference to the 2nd snapshot file ? (-00002.vmdk).
DO NOT USE IT!
Just use the parent and 00001.vmdk, does this make any difference?
Does the VM BOOT ?
DO NOT USE IT!
Just use the parent and 00001.vmdk, does this make any difference?
Does the VM BOOT ?
ASKER
I tried just 0001 ---> Base but it did not work.
It complains that the parent disk has changed and cant open error.
I think I have to restore the backup of C: drive and start tinkering again.
I restored the service from a standby system and praise the Lord, it is working on that front at-least.
I will stop now as the Director is not in a mood to do further troubleshooting but once off the network, I'll continue to recover it from where we left today.
Thank you so much for all your input. I really appreciate your help.
It complains that the parent disk has changed and cant open error.
I think I have to restore the backup of C: drive and start tinkering again.
I restored the service from a standby system and praise the Lord, it is working on that front at-least.
I will stop now as the Director is not in a mood to do further troubleshooting but once off the network, I'll continue to recover it from where we left today.
Thank you so much for all your input. I really appreciate your help.
It complains that the parent disk has changed and cant open error.
this is normal if the CIDs mismatch, match the CIDs, and this will align the snapshot to the parent.
whether this forced match will give you a working VM, is another issue.
As stated in the first post, VMs running on a snapshot, when the server fails or crashes, can cause corruption in the VM.
This is why snapshots are dangerous. DO NOT rely in SNAPSHOT Manager to tell you if you have a VM running on a snapshot, it should be your VMware Admin daily task to check each VM....
see my EE Article how to check
HOW TO: VMware Snapshots :- Be Patient
Snapshots are evil, and cause issues all the time!
ASKER
It complains that the parent disk has changed and cant open error.
I assume this is occurring due to the fact that I created another virtual machine and attached the .VMDK files to the new machine. While doing so only the base-file must have been attached and changed the file hash/properties or what ever VMware uses to know if the base file is in tact or not.
1. With the base vmdk attached, the machine struggles at boot with NO OPERATING SYSTEM found.
This seems to be an issue with the scsi controller - buslogic and thus recreating a new VM with buslogic scsi controller and attaching the base files resolved the issue and machine was able to boot.
2. Thus if the machine booting is a problem, I have to then recreate the snapshot chain as 00001 ---> Base and then edit the VMX of the new machine with VMDK as 00001, in which case I hope the controller problem and the up to date data could be achieved.
I need to try this once the machine is off network.
@ Andrew thank you for you timely help.
Snapshots are evil, and cause issues all the time!
Here comes my related question to this then.
1. Popular backup solutions use snapshots to make backups of virtual machines - Do these cause the same issues often? and as you suggested the Administrator has to check if any orphan snapshots are lying after a backup each day. Can we automate this check?
2. What is the best strategy to backup applications and data?
Traditional: Backing up code (tar / zip / sync) and data backups (.sql / .bak)
or
Block-level backup of VMs (VMware Data Protection / Veeam / BackupExec)
3. How do you compare (VMware Data Protection / Veeam / BackupExec)
Is there a reason to choose Veeam over DP & BackupExec especially when DP Is included in the VMware license.
4. Best practices and deployment design / strategy to implement backups of VMs (any doc / video please)
Thanks
Zen
I assume this is occurring due to the fact that I created another virtual machine and attached the .VMDK files to the new machine. While doing so only the base-file must have been attached and changed the file hash/properties or what ever VMware uses to know if the base file is in tact or not.
1. With the base vmdk attached, the machine struggles at boot with NO OPERATING SYSTEM found.
This seems to be an issue with the scsi controller - buslogic and thus recreating a new VM with buslogic scsi controller and attaching the base files resolved the issue and machine was able to boot.
2. Thus if the machine booting is a problem, I have to then recreate the snapshot chain as 00001 ---> Base and then edit the VMX of the new machine with VMDK as 00001, in which case I hope the controller problem and the up to date data could be achieved.
I need to try this once the machine is off network.
@ Andrew thank you for you timely help.
Snapshots are evil, and cause issues all the time!
Here comes my related question to this then.
1. Popular backup solutions use snapshots to make backups of virtual machines - Do these cause the same issues often? and as you suggested the Administrator has to check if any orphan snapshots are lying after a backup each day. Can we automate this check?
2. What is the best strategy to backup applications and data?
Traditional: Backing up code (tar / zip / sync) and data backups (.sql / .bak)
or
Block-level backup of VMs (VMware Data Protection / Veeam / BackupExec)
3. How do you compare (VMware Data Protection / Veeam / BackupExec)
Is there a reason to choose Veeam over DP & BackupExec especially when DP Is included in the VMware license.
4. Best practices and deployment design / strategy to implement backups of VMs (any doc / video please)
Thanks
Zen
I assume this is occurring due to the fact that I created another virtual machine and attached the .VMDK files to the new machine. While doing so only the base-file must have been attached and changed the file hash/properties or what ever VMware uses to know if the base file is in tact or not.
normal behaviour.
1. With the base vmdk attached, the machine struggles at boot with NO OPERATING SYSTEM found.
This seems to be an issue with the scsi controller - buslogic and thus recreating a new VM with buslogic scsi controller and attaching the base files resolved the issue and machine was able to boot.
Okay, so the parent file looks to be okay.
2. Thus if the machine booting is a problem, I have to then recreate the snapshot chain as 00001 ---> Base and then edit the VMX of the new machine with VMDK as 00001, in which case I hope the controller problem and the up to date data could be achieved.
yes, or the snaphsots are corrupted.
1. Popular backup solutions use snapshots to make backups of virtual machines - Do these cause the same issues often? and as you suggested the Administrator has to check if any orphan snapshots are lying after a backup each day. Can we automate this check?
Yes, they cause this issue. You can either check manully daily after backups, or set vCenter alarms, or run automated scripts.
2. What is the best strategy to backup applications and data?
Traditional: Backing up code (tar / zip / sync) and data backups (.sql / .bak)
or
Block-level backup of VMs (VMware Data Protection / Veeam / BackupExec)
Block Level, and SQL Backups as an option.
3. How do you compare (VMware Data Protection / Veeam / BackupExec)
Is there a reason to choose Veeam over DP & BackupExec especially when DP Is included in the VMware license.
No contest here, Veeam is the world leader.
4. Best practices and deployment design / strategy to implement backups of VMs (any doc / video please)
Select Veeam.
If you require more information, on the above, this really needs a new question for myself or other experts to answer, about VMware Backups.
ASKER
I'll update this question once I get any further with the recovery of snapshots ...
Thanks Andrew ...
Thanks Andrew ...
ASKER
I am not sure if the delta are corrupted but when I tried to download 0002 ... if failed to download to local machine.
Meanwhile, I am attaching the data store jpeg ... let me know if you find any obvious anomaly.
Thanks