We help IT Professionals succeed at work.

corrupt redo log. VMware ESXi

11,101 Views
Last Modified: 2013-12-30
Host: VMware ESXi, 5.0.0, 469512
Client shows error while booting up:
The redo log of {Servername}-000002.vmdk is corrupted. Power off the virtual machine, if the problem still persists, discard the redo log.

Actions:
Just about every article i have found regarding this says to consolidate the snapshots, so i deleted the snapshots (vSphere Client - Snapshot manager), which then told me that it wanted to consolidate the drives. When i tell it to consolidate i get
Consolidate virtual machine disk files
A general system error occurred: Input/output error

Following http://dunfraggin.blogspot.com.au/2012/09/virtual-machine-disk-consolidation.html i susspect there may b e locked files. The server reports that it is powered Off, but did come up with an error during the Power Off task.

 The article above leads me to http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&docTypeID=DT_KB_1_1&externalId=10051
if i look at the  vmware.log file i am not seeing anything unexpected at the end of the file.
however if i try to use vmkfstools i get the following

login as: itsupport
Using keyboard-interactive authentication.
Password:
The time and date of this login have been sent to the system logs.

VMware offers supported, powerful system administration tools.  Please
see www.vmware.com/go/sysadmintools for details.

The ESXi Shell can be disabled by an administrative user. See the
vSphere Security documentation for more information.
~ $ su
Password:
~ # vmkfstools
sh: vmkfstools: not found
~ #

So not really sure where to go from here ;)

Additional notes:
If this helps at all (Vmid 2 is the one in question)

~ # vim-cmd vmsvc/getallvms
Vmid      Name                     File                        Guest OS          Version   Annotation
1      SBS2011      [datastore1] SBS2011/SBS2011.vmx     windows7Server64Guest   vmx-08
2      2008Leap     [datastore2] 2008Leap/2008Leap.vmx   windows7Server64Guest   vmx-08
3      2008Leap-2   [NAS01] 2008Leap-2/2008Leap-2.vmx    windows7Server64Guest   vmx-08
~ # vim-cmd vmsvc/power.getstate 2
Retrieved runtime info
Powered off
~ #
There is only the one host, so no other host could possibly have a lock.
Have rebooted the host:- no change.

Cheers
Andrew
Comment
Watch Question

Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
The redo log to which the error refers is the 2nd snapshot delta disk, snapshot deltas do become corrupted, and when the chain is corrupted, the VM will not start.

e.g. parent disk (vmdk) + 1st snapshot disk + 2nd snapshot disk = VM VMDK

the chain of the above must be correct. if the host server crashed, restarted, datastore ran out of space, the 2nd snapshot disk gets corrupted. ALL the Information in the 2nd snapshot disk needs to be discarded, to get the VM started, which means data loss.

It's possible damage or corruption has already occured to the snapshot disk, and you would have to discard this delta, to start the VM, resulting in a corrupt or out dated VM.

can you get me a list of the current files in the folder, and I can work with you to see if we can get the VM started. Please do not try to fiddle, because you could cause more damage.

HOW TO: VMware Snapshots :- Be Patient

Also, can you reply, events which happened before this?

Datastore out of space?

Host Server restart?

Host Server crash?

(I'm on GMT UK Time, so just about to get some Zzzzzs!). But I'll hang in here for 30 mins.

(also try logging in as root, to try vmkfstools!)
Andrew DavisManager

Author

Commented:
Thanks for your time ;)
Datastore out of space? NO 50% head room.
Host Server restart? Kinda. There was a power outage around the time of the crash. UPS should have seen to a clean shutdown.
Files in directory:-
# cd /vmfs/volumes/datastore2/2008Leap
/vmfs/volumes/4ff54135-0181f04a-d355-00215e6eae71/2008Leap # ls
2008Leap-000001-delta.vmdk      2008Leap-flat.vmdk              2008Leap.vmxf                   2008Leap_1-flat.vmdk            vmmcores-5.gz                   vmware-13.log                   vmware.log
2008Leap-000001.vmdk            2008Leap.nvram                  2008Leap_1-000001-delta.vmdk    2008Leap_1.vmdk                 vmmcores-6.gz                   vmware-14.log                   vmx-2008Leap-3315273081-2.vswp
2008Leap-000002-delta.vmdk      2008Leap.vmdk                   2008Leap_1-000001.vmdk          vmmcores-2.gz                   vmmcores-7.gz                   vmware-15.log                   vmx-zdump.001
2008Leap-000002.vmdk            2008Leap.vmsd                   2008Leap_1-000002-delta.vmdk    vmmcores-3.gz                   vmmcores.gz                     vmware-16.log                   vmx-zdump.002
2008Leap-aux.xml                2008Leap.vmx                    2008Leap_1-000002.vmdk          vmmcores-4.gz                   vmware-12.log                   vmware-17.log                   vmx-zdump.003
/vmfs/volumes/4ff54135-0181f04a-d355-00215e6eae71/2008Leap #
Note "2008Leap.vmx" is green in colour.
Cheers
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
okay, this is the reason for failure...

The redo log to which the error refers is the 2nd snapshot delta disk, snapshot deltas do become corrupted, and when the chain is corrupted, the VM will not start.

e.g. parent disk (vmdk) + 1st snapshot disk + 2nd snapshot disk = VM VMDK

the chain of the above must be correct. if the host server crashed, restarted, datastore ran out of space, the 2nd snapshot disk gets corrupted. ALL the Information in the 2nd snapshot disk needs to be discarded, to get the VM started, which means data loss.
Andrew DavisManager

Author

Commented:
In preperation for wost case, i have also:-

1. Copied the servers directory to another datastore (in case i need to get anything back).
2. Created a new VM (2008Leap-2) and done a successfull restore to a NFS share on NAS01. I have not connected it to the network as yet, as i just wanted to prove the disaster recovery, and cannot leave it sitting on the NAS (tooooo slow).

(Just saw another post on this question come in so this comment will be out of order)

Cheers
Andrew DavisManager

Author

Commented:
So go with disaster recovery?

I have shadow protect backup that will have all data, as office was closed (Thankyou holiday period).


Cheers
Andrew
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
okay, so this is a two disk virtual machine....with disks...

2008Leap-flat.vmdk      
2008Leap-000001.vmdk  
2008Leap-000001-delta.vmdk
2008Leap-000002-delta.vmdk

2008Leap_1-flat.vmdk  
2008Leap_1-000001-delta.vmdk
2008Leap_1-000002-delta.vmdk

both disks have two snapshots.

Can you try the following:-

1. Take a Snapshot of the current VM
2. Wait 60 seconds
3. Delete the Snapshot

report any errors, and then check the disk if snapshots have gone. (the delta files!)
Andrew DavisManager

Author

Commented:
With it powered off?
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
I'm afraid this file

{Servername}-000002.vmdk

is corrupt, and would have to be excluded from the VM configuration to get the VM started, depending on how long the machine was running on this snapshot disk, e.g. 12 days, the VM would be 12 days old.

If you have Backups, time to restore, and monitor those snapshots daily...and check your VMs are not running on snapshots.
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
Yes, Powered OFF.

Is the VM ON?
Andrew DavisManager

Author

Commented:
Everything looked fine with no errors.
It did report that it needs consolidation now.
I havnt done the consolidation yet, but is not showing the delta's gone (as expected i now have another).

Cheers
Andrew DavisManager

Author

Commented:
Yes, Powered OFF.

Is the VM ON?

Sorry was a stupid question on my behalf;)
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
Consolidation Message appears because it detects snapshots, but it's not intelligent to know if they are corrupted and cannot be merged or discarded.

did you do the above test?
Andrew DavisManager

Author

Commented:
sorry just also noticed
(also try logging in as root, to try vmkfstools!)

root is denied direct login, i have to ssh with another user and then su to root, as i dont have physical access to this server.

Cheers
Andrew
Andrew DavisManager

Author

Commented:
Yes i have done the above do you want me to try to restart it?

Sorry, as you obviously know more than me, i am not presuming anything ;)

Cheers
Andrew
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
can you get a new listing of the folder for me? (with sizes)

ls - al
Andrew DavisManager

Author

Commented:
listing
/vmfs/volumes/4ff54135-0181f04a-d355-00215e6eae71/2008Leap # ls -al
drwxr-xr-x    1 root     root               5740 Dec 31 02:23 .
drwxr-xr-t    1 root     root               1400 Dec 30 07:13 ..
-rw-------    1 root     root        30098534400 Dec 30 05:43 2008Leap-000001-delta.vmdk
-rw-------    1 root     root                320 Dec 30 05:41 2008Leap-000001.vmdk
-rw-------    1 root     root           16986112 Dec 31 01:01 2008Leap-000002-delta.vmdk
-rw-------    1 root     root                327 Dec 31 00:59 2008Leap-000002.vmdk
-rw-------    1 root     root             208896 Dec 31 02:22 2008Leap-000003-delta.vmdk
-rw-------    1 root     root                327 Dec 31 02:22 2008Leap-000003.vmdk
-rw-r--r--    1 root     root                 13 Dec 31 02:23 2008Leap-aux.xml
-rw-------    1 root     root       107374182400 Dec 31 02:23 2008Leap-flat.vmdk
-rw-------    1 root     root               8684 Dec 31 01:00 2008Leap.nvram
-rw-------    1 root     root                523 Dec 31 02:23 2008Leap.vmdk
-rw-r--r--    1 root     root                 77 Dec 31 02:23 2008Leap.vmsd
-rwxr-xr-x    1 root     root               3573 Dec 31 02:22 2008Leap.vmx
-rw-r--r--    1 root     root                263 Dec 30 08:02 2008Leap.vmxf
-rw-------    1 root     root        43923165184 Dec 30 05:41 2008Leap_1-000001-delta.vmdk
-rw-------    1 root     root                324 Mar 22  2013 2008Leap_1-000001.vmdk
-rw-------    1 root     root           17190912 Dec 30 05:46 2008Leap_1-000002-delta.vmdk
-rw-------    1 root     root                331 Dec 30 05:45 2008Leap_1-000002.vmdk
-rw-------    1 root     root             413696 Dec 31 02:22 2008Leap_1-000003-delta.vmdk
-rw-------    1 root     root                331 Dec 31 02:22 2008Leap_1-000003.vmdk
-rw-------    1 root     root       214748364800 Nov 11  2012 2008Leap_1-flat.vmdk
-rw-------    1 root     root                525 Jul  8  2012 2008Leap_1.vmdk
-rw-r--r--    1 root     root            5416529 Dec 30 05:46 vmmcores-2.gz
-rw-r--r--    1 root     root            4948934 Dec 30 05:54 vmmcores-3.gz
-rw-r--r--    1 root     root            5739430 Dec 30 06:53 vmmcores-4.gz
-rw-r--r--    1 root     root            5664529 Dec 30 07:03 vmmcores-5.gz
-rw-r--r--    1 root     root            5828171 Dec 30 23:59 vmmcores-6.gz
-rw-r--r--    1 root     root            5686604 Dec 31 00:10 vmmcores-7.gz
-rw-r--r--    1 root     root            5733245 Dec 31 01:01 vmmcores.gz
-rw-r--r--    1 root     root             164211 Dec 30 05:46 vmware-12.log
-rw-r--r--    1 root     root             164471 Dec 30 05:55 vmware-13.log
-rw-r--r--    1 root     root             158944 Dec 30 06:53 vmware-14.log
-rw-r--r--    1 root     root             157904 Dec 30 07:03 vmware-15.log
-rw-r--r--    1 root     root             163543 Dec 30 23:59 vmware-16.log
-rw-r--r--    1 root     root             164108 Dec 31 00:10 vmware-17.log
-rw-r--r--    1 root     root             164443 Dec 31 01:01 vmware.log
-rw-------    1 root     root           52428800 Jul  8  2012 vmx-2008Leap-3315273081-2.vswp
-r--------    1 root     root            5042176 Dec 30 23:59 vmx-zdump.001
-r--------    1 root     root            4980736 Dec 31 00:10 vmx-zdump.002
-r--------    1 root     root            5001216 Dec 31 01:01 vmx-zdump.003
/vmfs/volumes/4ff54135-0181f04a-d355-00215e6eae71/2008Leap #
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
did you delete the snapshot?

because it's created the third.....

then select DELETE ALL!
Andrew DavisManager

Author

Commented:
if screen shot is easier
vm1.JPG
Andrew DavisManager

Author

Commented:
did you delete the snapshot?

Sure did. as per screen shot. I didnt do the consolidate.
vm2.JPG
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
Selecting DELETE ALL should have "removed all the snapshots", when used Take Snapshot, did it appear in the list?

can you also confirm, if you look at the VM Settings, Right Click Edit VM, check the disks, you'll see if the VM is running on the disks....because it will be using 00003 etc

can you confirm?

also what size are these disks?
Andrew DavisManager

Author

Commented:
Yes everything appeared correctly. could see the snapshots listed in the manager, then they were gone (after delete)

vm3.jpg shows 388.55 Gig free of 838Gig drive.

vm4.jpg shows settings. it reports that it is running from 000003.vmdk snapshot.

Cheers
Andrew
vm3.JPG
vm4.JPG
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
Can you power on the VM now?
Andrew DavisManager

Author

Commented:
As expected. same problem. see screen shot.

Redo log corrupted.


Cheers
Andrew

In preparation i have started moving the backup recovery from the NAS to the datastore2 ;)
vm5.JPG
Andrew DavisManager

Author

Commented:
If it matters (i dont think it does but hey i could be wrong), when monitoing the console, it does show the usual 2008 R2 startup screens, to the point of "loading windows" then dies.

Cheers
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
how very odd, it usually will NOT add another snapshot if the chain is incorrect or corrupt.

try at the console

vim-cmd vmsvc/snapshot.removeall 2 (this is a consolidation task!)

this will try and remove and merge all the snapshots, but if there is corruption it will fail.
Andrew DavisManager

Author

Commented:
after the error above, when i do a "Power off" of the VM it hangs at 95% then comes up with,
The attempted operation cannot be performed in the current state (Powered off).

Cheers
vm6.JPG
Andrew DavisManager

Author

Commented:
try at the console

vim-cmd vmsvc/snapshot.removeall 2 (this is a consolidation task!)
it looks like it asks a question but doesnt pause for an answer.

if i then do a ls -all nothing has changed.

/vmfs/volumes/4ff54135-0181f04a-d355-00215e6eae71/2008Leap # vim-cmd vmsvc/snapshot.removeall 2
Remove All Snapshots:
/vmfs/volumes/4ff54135-0181f04a-d355-00215e6eae71/2008Leap #

Open in new window

VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017
Commented:
This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION
Andrew DavisManager

Author

Commented:
hehe....

Kinda thought that was coming at some point ;)

on a side note is there any great benefit in upgrading my version of ESXi 5.0
If so do you have a good guide on how to?

Cheers
Andrew
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
You should really be on the latest version of 5.0 U3, for security reasons. e.g. 5.0 - 5.0u3, as for whether you should be on 5.1 or 5.5.

it's done similar to this:-

HOW TO: Upgrade from VMware vSphere Hypervisor ESXi 5.1 to VMware vSphere Hypervisor ESXi 5.5 for FREE

depends on your requirements....

HOW TO: What's New in VMware vSphere Hypervisor 5.5 (ESXi 5.5)

All the Best, Happy New Year, must get some Zzzzzzs now.
Andrew DavisManager

Author

Commented:
No Problem, will have a read.

Thanks for all your help, now go get some sleep ;)

Hope you had a merry Christmas, and have a Happy New Year.

Cheers
Andrew
Australia.

Gain unlimited access to on-demand training courses with an Experts Exchange subscription.

Get Access
Why Experts Exchange?

Experts Exchange always has the answer, or at the least points me in the correct direction! It is like having another employee that is extremely experienced.

Jim Murphy
Programmer at Smart IT Solutions

When asked, what has been your best career decision?

Deciding to stick with EE.

Mohamed Asif
Technical Department Head

Being involved with EE helped me to grow personally and professionally.

Carl Webster
CTP, Sr Infrastructure Consultant
Empower Your Career
Did You Know?

We've partnered with two important charities to provide clean water and computer science education to those who need it most. READ MORE

Ask ANY Question

Connect with Certified Experts to gain insight and support on specific technology challenges including:

  • Troubleshooting
  • Research
  • Professional Opinions
Unlock the solution to this question.
Join our community and discover your potential

Experts Exchange is the only place where you can interact directly with leading experts in the technology field. Become a member today and access the collective knowledge of thousands of technology experts.

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.