Solved

Recreating vmdk snapshots broken chain

Posted on 2015-01-18
23
382 Views
Last Modified: 2015-03-04
Hi All,
I had a server crash with the RAID controller unable to detect the hard drive pair. To recover I had to get into the BIOS and change the settings so that the disk are detected again.

After the reboot the disk were detected but the vmdks were disconnected from the virtual machines in the inventory.

I attached the vmdk back to Windows 2008 server and restarted the VM and it worked fine. However, I had another VM running Windows 2000 with two snapshots. When I try to restart the VM with only flat file it works fine but when I try to restart with the snapshot 0002, it throws an error.

I assume that the snapshot chain has broken between the snapshots and the base vmdk file.

How do I recover / rebuild the snapshot chain (preferably without the need of downloading the vmdks and manipulating using Workstation or vConverter)

a. How do I check / change the CID for the affected VM / VMDKs

Another strange problem occurred when I tried to remove the affected VM from the inventory and within the datastore tired to rename its folder name from ABC to ABC_ORIGINAL .... instead of renaming the directory, vmware automatically started moving files from ABC to ABC_ORIGINAL ... on the client it displayed a task MOVE FILE with cancel greyed out ...

Even this task failed after 3% .... so I don't know what is the state of my VM now?

Could anyone help with it please?

Thanks
0
Comment
Question by:zen shaw
  • 12
  • 11
23 Comments
 
LVL 117

Accepted Solution

by:
Andrew Hancock (VMware vExpert / EE MVE) earned 500 total points
ID: 40556267
I assume that the snapshot chain has broken between the snapshots and the base vmdk file.

Possibly the chain has broken, or more likely the snapshot is corrupted and the virtual disk is damaged. Snapshot delta disks are very fragile, and get corrupted easily.

a. How do I check / change the CID for the affected VM / VMDKs

This is quite simple, and can be done with vi on the console/or remotely via ssh, or you can do it using WinSCP, from a Windows machine.

This is the article you need to follow to check CIDs, often you will get an error about mismatch, snapshots, if the CIDs are different.

Resolving the CID mismatch error: The parent virtual disk has been modified since the child was created (1007969)

You probably have two snapshots, 0001 and 0002, so you may have to drop and disguard a snapshot file, which could end up with a corrupt parent disk.
0
 

Author Comment

by:zen shaw
ID: 40556273
I'll try to follow the link you sent.

I am not sure if the delta are corrupted but when I tried to download 0002 ... if failed to download to local machine.

Meanwhile, I am attaching the data store jpeg ... let me know if you find any obvious anomaly.

Thanks
0
 
LVL 117
ID: 40556295
No screen attached ?

I am not sure if the delta are corrupted but when I tried to download 0002 ... if failed to download to local machine.

download via datastore browser? if so the datastore is corrupted and so is the file that sits on the datastore.

If that is the case, I would *BACKUP NOW* ALL your VMs.

Check the hardware and RAID, and disks, and erase the current datastore, RAID array, and re-create.
0
 

Author Comment

by:zen shaw
ID: 40556358
Hi,
I am attaching the descriptors ... you can have a look and tell me the problem:

I assumed the chain would be

C drive Docobo_BE.vmdk------>Docobo_BE_0000001.vmdk------->DOCOBO_BE_0000002.vmdk
E drive Docobo_BE_1.vmdk------>Docobo_BE_1_0000001.vmdk------->DOCOBO_BE_1_0000002.vmdk

But based on the descriptors, for C drive, there is the base descriptor file missing - How can I generate this file?

For D: If we trace the CID and Parent CID, it seems the order of the chain is
E drive Docobo_BE_1.vmdk------>Docobo_BE_1_0000002.vmdk------->DOCOBO_BE_1_0000001.vmdk

Could it be possible that the snapshots have the order change maybe by deleting or consolidating ....

How could I rebuild the chain ... would appreciate if you could look at the CID and Parent CID and suggest me a solution.

Thanks
0
 

Author Comment

by:zen shaw
ID: 40556366
0
 
LVL 117
ID: 40556383
can I just query something, you refer to C: D: and E:

but there are ONLY two virtual disks, with two snapshots ?

can I also have the exact error message, when you tried to start the VM, or add the snapshot 0002 disk.

and was the error the same for both disks ?

Okay, I can see that you are missing a descriptor file for

DOCOBO_BE.vmdk

the first disk......

this is also a common issue, that it disappears...

so, you will need this article also

Recreating a missing virtual disk (VMDK) descriptor file for delta disks (1026353)

Recreating a missing virtual machine disk descriptor file (1002511)
0
 

Author Comment

by:zen shaw
ID: 40556401
Hi Andrew,
There error I had received was NO OPERATING SYSTEM

Sorry for the confusion: There are only two drive C: and E: --- No D:

I am attaching the original vmx file also in which for the C: drive .... the current vmdk is
scsi0:0.present = "true"
scsi0:0.fileName = "DOCOBO_BE-000001.vmdk"
scsi0:0.deviceType = "scsi-hardDisk"

Based on the .vmx ... I am assuming that the .00002 vmdk was never used. Thus, I will have a missing base vmdk descriptor of the C: drive ....
DOCOBO-BE-Orig.vmx.txt
0
 
LVL 117
ID: 40556404
Okay, so the virtual machine actually powered on with no VM error ? and the BIOS reported ?

NO OPERATING SYSTEM ?

and this was when you added the 0002 snapshot, connecting the first VMDK only without the snapshot BOOTED an OS?

If that is the original file, the 2nd snapshot was *NOT IN USE* at the time of crash!

Just add the 0001 snapshot to the VM and power on!
0
 

Author Comment

by:zen shaw
ID: 40556425
Yes, I am recreating the C: drive base descriptor file as that file is missing .... please refer my attached screenshot earlier....

I am going to create that base descriptor and point the 00001 descriptor to base file (newly created) and start the server.

I'll skip 0002 on both C and E drives.....

Am I doing it right?
0
 
LVL 117
ID: 40556458
Yes, that's correct.

So all that's wrong, is the missing descriptor.

If after adding the 0001 snapshot, the OS does not boot or is corrupted, the 0001 snapshot is corrupted and you will have to discard all snapshots, and just use the parent.
0
 

Author Comment

by:zen shaw
ID: 40556459
I corrected the chain as BASE ---> 00002 ----> 00001 and I am sure this is the right sequence. However, when I start the VM, I get NO OPERATING SYSTEM Detected at Bios....

When I go to BIOS .... I do not see any disk available under Primary / Secondary slave.

Note: The OS on C is windows 2000

Any idea?
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 117
ID: 40556464
discard the second snapshot.....0002, it's not required, it's not been in use....

the chain is 00002 ----- 00001 ------ BASE

but from your original VMX, 00002 was not in use so do not use.
0
 

Author Comment

by:zen shaw
ID: 40556519
1. I recreated the chain as 00001 -----> 00002 ---> Base
2. Rebooted the machine and change the boot sequence to hard disk - no luck - cannot detect OS
3. Changed the Controller from Buslogic to Lsilogic - Gave a warning saying the drives were created for BusLogic, so cancelled the conversion and click No instead of Yes and continue
4. Create a new VM and attached the two VMDKs pointing to 0001 for both drives.
5. System booted correctly
6. Checked the eventlog and it showed 2012 entries.

I definitely changed the vmx entry to 0001 ...

Has my chain worked or its defaulting to the base file not allowing the changes done to the snapshots?
0
 
LVL 117
ID: 40556523
When you power on the VM, it checks the snapshot chain is correct, or it will give an error message and HALT!

It would seem the 00002 snapshot is corrupted, and when this is added, it causes the issue, which is normal, of corrupt snapshot disks.
If they are all working, get RID of the snapshot state!

If the current disk, is running on a snapshot 0001 and parent disk (base), this is all you can do now, and then merge DELETE ALL the snapshot to ensure the changes are committed to the parent disk (base), and the snapshot is deleted and gone!
0
 

Author Comment

by:zen shaw
ID: 40556561
So are you saying that the 0002 was never used? And the chain was 0001 ---> Base?

Is that a conclusion based on the original vmx file?

In that case, are you suggesting me to have  0001 ---> Base for both C: and E: and reboot?

how do I get the RID?
0
 
LVL 117
ID: 40556595
Do you have the original, unmodified VMX file?

Yes or No ?

does the VMX file include any reference to the 2nd snapshot file ? (-00002.vmdk)

Yes or No ?

Did you know that this VM was running on a snapshot ?
0
 

Author Comment

by:zen shaw
ID: 40556602
Do you have the original, unmodified VMX file?

I was passed a VMX file saying that was the original, which I forwarded to you. I assume the engineer before me did not change an thing in it and it is indeed the original.

does the VMX file include any reference to the 2nd snapshot file ? (-00002.vmdk)

Assuming the above is true- No there is no reference.

Did you know that this VM was running on a snapshot ?

I don't know - I am told that there were no snapshots visible in the Snapshot manager earlier.
0
 
LVL 117
ID: 40556634
Okay, assuming the VMX file DID NOT include any reference to the 2nd snapshot file ? (-00002.vmdk).

DO NOT USE IT!

Just use the parent and 00001.vmdk, does this make any difference?

Does the VM BOOT ?
0
 

Author Comment

by:zen shaw
ID: 40556646
I tried just 0001 ---> Base but it did not work.

It complains that the parent disk has changed and cant open error.

I think I have to restore the backup of C: drive and start tinkering again.

I restored the service from a standby system and praise the Lord, it is working on that front at-least.

I will stop now as the Director is not in a mood to do further troubleshooting but once off the network, I'll continue to recover it from where we left today.

Thank you so much for all your input. I really appreciate your help.
0
 
LVL 117
ID: 40556667
It complains that the parent disk has changed and cant open error.

this is normal if the CIDs mismatch, match the CIDs, and this will align the snapshot to the parent.

whether this forced match will give you a working VM, is another issue.

As stated in the first post, VMs running on a snapshot, when the server fails or crashes, can cause corruption in the VM.

This is why snapshots are dangerous. DO NOT rely in SNAPSHOT Manager to tell you if you have a VM running on a snapshot, it should be your VMware Admin daily task to check each VM....

see my EE Article how to check

HOW TO: VMware Snapshots :- Be Patient

Snapshots are evil, and cause issues all the time!
0
 

Author Comment

by:zen shaw
ID: 40557179
It complains that the parent disk has changed and cant open error.

I assume this is occurring due to the fact that I created another virtual machine and attached the .VMDK files to the new machine. While doing so only the base-file must have been attached and changed the file hash/properties or what ever VMware uses to know if the base file is in tact or not.

1. With the base vmdk attached, the machine struggles at boot with NO OPERATING SYSTEM found.
This seems to be an issue with the scsi controller - buslogic and thus recreating a new VM with buslogic scsi controller and attaching the base files resolved the issue and machine was able to boot.

2. Thus if the machine booting is a problem, I have to then recreate the snapshot chain as 00001 ---> Base and then edit the VMX of the new machine with VMDK as 00001, in which case I hope the controller problem and the up to date data could be achieved.

I need to try this once the machine is off network.

@ Andrew thank you for you timely help.

Snapshots are evil, and cause issues all the time!
 Here comes my related question to this then.

1. Popular backup solutions use snapshots to make backups of virtual machines - Do these cause the same issues often? and as you suggested the Administrator has to check if any orphan snapshots are lying after a backup each day. Can we automate this check?

2. What is the best strategy to backup applications and data?
Traditional: Backing up code (tar / zip / sync) and data backups (.sql / .bak)
or
Block-level backup of VMs (VMware Data Protection / Veeam / BackupExec)

3. How do you compare (VMware Data Protection / Veeam / BackupExec)
Is there a reason to choose Veeam over DP & BackupExec especially when DP Is included in the VMware license.

4. Best practices and deployment design / strategy to implement backups of VMs (any doc / video please)

Thanks
Zen
0
 
LVL 117
ID: 40557210
I assume this is occurring due to the fact that I created another virtual machine and attached the .VMDK files to the new machine. While doing so only the base-file must have been attached and changed the file hash/properties or what ever VMware uses to know if the base file is in tact or not.

normal behaviour.

1. With the base vmdk attached, the machine struggles at boot with NO OPERATING SYSTEM found.
This seems to be an issue with the scsi controller - buslogic and thus recreating a new VM with buslogic scsi controller and attaching the base files resolved the issue and machine was able to boot.

Okay, so the parent file looks to be okay.

2. Thus if the machine booting is a problem, I have to then recreate the snapshot chain as 00001 ---> Base and then edit the VMX of the new machine with VMDK as 00001, in which case I hope the controller problem and the up to date data could be achieved.

yes, or the snaphsots are corrupted.

1. Popular backup solutions use snapshots to make backups of virtual machines - Do these cause the same issues often? and as you suggested the Administrator has to check if any orphan snapshots are lying after a backup each day. Can we automate this check?

Yes, they cause this issue. You can either check manully daily after backups, or set vCenter alarms, or run automated scripts.

2. What is the best strategy to backup applications and data?
Traditional: Backing up code (tar / zip / sync) and data backups (.sql / .bak)
or
Block-level backup of VMs (VMware Data Protection / Veeam / BackupExec)

Block Level, and SQL Backups as an option.

3. How do you compare (VMware Data Protection / Veeam / BackupExec)
Is there a reason to choose Veeam over DP & BackupExec especially when DP Is included in the VMware license.

No contest here, Veeam is the world leader.

4. Best practices and deployment design / strategy to implement backups of VMs (any doc / video please)

Select Veeam.

If you require more information, on the above, this really needs a new question for myself or other experts to answer, about VMware Backups.
0
 

Author Comment

by:zen shaw
ID: 40557271
I'll update this question once I get any further with the recovery of snapshots ...  

Thanks Andrew ...
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

Suggested Solutions

VM backup deduplication is a method of reducing the amount of storage space needed to save VM backups. In most organizations, VMs contain many duplicate copies of data, such as VMs deployed from the same template, VMs with the same OS, or VMs that h…
Last article we focus in how to VMware: How to create and use VMs TAGs – Part 1 so before follow this article and perform the next tasks, you should read the first article how to create the TAG before using them in Veeam Backup Jobs.
Teach the user how to configure vSphere clusters to support the VMware FT feature Open vSphere Web Client: Verify vSphere HA is enabled: Verify netowrking for vMotion and FT Logging is in place or create it: Turn On FT for a virtual machine: Verify …
Teach the user how to install and configure the vCenter Orchestrator virtual appliance Open vSphere Web Client: Deploy vCenter Orchestrator virtual appliance OVA file: Verify vCenter Orchestrator virtual appliance boots successfully: Connect to the …

706 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now