Solved

corrupt redo log. VMware ESXi

Posted on 2013-12-30
31
7,224 Views
Last Modified: 2013-12-30
Host: VMware ESXi, 5.0.0, 469512
Client shows error while booting up:
The redo log of {Servername}-000002.vmdk is corrupted. Power off the virtual machine, if the problem still persists, discard the redo log.

Actions:
Just about every article i have found regarding this says to consolidate the snapshots, so i deleted the snapshots (vSphere Client - Snapshot manager), which then told me that it wanted to consolidate the drives. When i tell it to consolidate i get
Consolidate virtual machine disk files
A general system error occurred: Input/output error

Following http://dunfraggin.blogspot.com.au/2012/09/virtual-machine-disk-consolidation.html i susspect there may b e locked files. The server reports that it is powered Off, but did come up with an error during the Power Off task.

 The article above leads me to http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&docTypeID=DT_KB_1_1&externalId=10051
if i look at the  vmware.log file i am not seeing anything unexpected at the end of the file.
however if i try to use vmkfstools i get the following

login as: itsupport
Using keyboard-interactive authentication.
Password:
The time and date of this login have been sent to the system logs.

VMware offers supported, powerful system administration tools.  Please
see www.vmware.com/go/sysadmintools for details.

The ESXi Shell can be disabled by an administrative user. See the
vSphere Security documentation for more information.
~ $ su
Password:
~ # vmkfstools
sh: vmkfstools: not found
~ #

So not really sure where to go from here ;)

Additional notes:
If this helps at all (Vmid 2 is the one in question)

~ # vim-cmd vmsvc/getallvms
Vmid      Name                     File                        Guest OS          Version   Annotation
1      SBS2011      [datastore1] SBS2011/SBS2011.vmx     windows7Server64Guest   vmx-08
2      2008Leap     [datastore2] 2008Leap/2008Leap.vmx   windows7Server64Guest   vmx-08
3      2008Leap-2   [NAS01] 2008Leap-2/2008Leap-2.vmx    windows7Server64Guest   vmx-08
~ # vim-cmd vmsvc/power.getstate 2
Retrieved runtime info
Powered off
~ #
There is only the one host, so no other host could possibly have a lock.
Have rebooted the host:- no change.

Cheers
Andrew
0
Comment
Question by:Andrew Davis
  • 18
  • 13
31 Comments
 
LVL 117

Expert Comment

by:Andrew Hancock (VMware vExpert / EE MVE)
Comment Utility
The redo log to which the error refers is the 2nd snapshot delta disk, snapshot deltas do become corrupted, and when the chain is corrupted, the VM will not start.

e.g. parent disk (vmdk) + 1st snapshot disk + 2nd snapshot disk = VM VMDK

the chain of the above must be correct. if the host server crashed, restarted, datastore ran out of space, the 2nd snapshot disk gets corrupted. ALL the Information in the 2nd snapshot disk needs to be discarded, to get the VM started, which means data loss.

It's possible damage or corruption has already occured to the snapshot disk, and you would have to discard this delta, to start the VM, resulting in a corrupt or out dated VM.

can you get me a list of the current files in the folder, and I can work with you to see if we can get the VM started. Please do not try to fiddle, because you could cause more damage.

HOW TO: VMware Snapshots :- Be Patient

Also, can you reply, events which happened before this?

Datastore out of space?

Host Server restart?

Host Server crash?

(I'm on GMT UK Time, so just about to get some Zzzzzs!). But I'll hang in here for 30 mins.

(also try logging in as root, to try vmkfstools!)
0
 
LVL 18

Author Comment

by:Andrew Davis
Comment Utility
Thanks for your time ;)
Datastore out of space? NO 50% head room.
Host Server restart? Kinda. There was a power outage around the time of the crash. UPS should have seen to a clean shutdown.
Files in directory:-
# cd /vmfs/volumes/datastore2/2008Leap
/vmfs/volumes/4ff54135-0181f04a-d355-00215e6eae71/2008Leap # ls
2008Leap-000001-delta.vmdk      2008Leap-flat.vmdk              2008Leap.vmxf                   2008Leap_1-flat.vmdk            vmmcores-5.gz                   vmware-13.log                   vmware.log
2008Leap-000001.vmdk            2008Leap.nvram                  2008Leap_1-000001-delta.vmdk    2008Leap_1.vmdk                 vmmcores-6.gz                   vmware-14.log                   vmx-2008Leap-3315273081-2.vswp
2008Leap-000002-delta.vmdk      2008Leap.vmdk                   2008Leap_1-000001.vmdk          vmmcores-2.gz                   vmmcores-7.gz                   vmware-15.log                   vmx-zdump.001
2008Leap-000002.vmdk            2008Leap.vmsd                   2008Leap_1-000002-delta.vmdk    vmmcores-3.gz                   vmmcores.gz                     vmware-16.log                   vmx-zdump.002
2008Leap-aux.xml                2008Leap.vmx                    2008Leap_1-000002.vmdk          vmmcores-4.gz                   vmware-12.log                   vmware-17.log                   vmx-zdump.003
/vmfs/volumes/4ff54135-0181f04a-d355-00215e6eae71/2008Leap #
Note "2008Leap.vmx" is green in colour.
Cheers
0
 
LVL 117

Expert Comment

by:Andrew Hancock (VMware vExpert / EE MVE)
Comment Utility
okay, this is the reason for failure...

The redo log to which the error refers is the 2nd snapshot delta disk, snapshot deltas do become corrupted, and when the chain is corrupted, the VM will not start.

e.g. parent disk (vmdk) + 1st snapshot disk + 2nd snapshot disk = VM VMDK

the chain of the above must be correct. if the host server crashed, restarted, datastore ran out of space, the 2nd snapshot disk gets corrupted. ALL the Information in the 2nd snapshot disk needs to be discarded, to get the VM started, which means data loss.
0
 
LVL 18

Author Comment

by:Andrew Davis
Comment Utility
In preperation for wost case, i have also:-

1. Copied the servers directory to another datastore (in case i need to get anything back).
2. Created a new VM (2008Leap-2) and done a successfull restore to a NFS share on NAS01. I have not connected it to the network as yet, as i just wanted to prove the disaster recovery, and cannot leave it sitting on the NAS (tooooo slow).

(Just saw another post on this question come in so this comment will be out of order)

Cheers
0
 
LVL 18

Author Comment

by:Andrew Davis
Comment Utility
So go with disaster recovery?

I have shadow protect backup that will have all data, as office was closed (Thankyou holiday period).


Cheers
Andrew
0
 
LVL 117

Expert Comment

by:Andrew Hancock (VMware vExpert / EE MVE)
Comment Utility
okay, so this is a two disk virtual machine....with disks...

2008Leap-flat.vmdk      
2008Leap-000001.vmdk  
2008Leap-000001-delta.vmdk
2008Leap-000002-delta.vmdk

2008Leap_1-flat.vmdk  
2008Leap_1-000001-delta.vmdk
2008Leap_1-000002-delta.vmdk

both disks have two snapshots.

Can you try the following:-

1. Take a Snapshot of the current VM
2. Wait 60 seconds
3. Delete the Snapshot

report any errors, and then check the disk if snapshots have gone. (the delta files!)
0
 
LVL 18

Author Comment

by:Andrew Davis
Comment Utility
With it powered off?
0
 
LVL 117

Expert Comment

by:Andrew Hancock (VMware vExpert / EE MVE)
Comment Utility
I'm afraid this file

{Servername}-000002.vmdk

is corrupt, and would have to be excluded from the VM configuration to get the VM started, depending on how long the machine was running on this snapshot disk, e.g. 12 days, the VM would be 12 days old.

If you have Backups, time to restore, and monitor those snapshots daily...and check your VMs are not running on snapshots.
0
 
LVL 117

Expert Comment

by:Andrew Hancock (VMware vExpert / EE MVE)
Comment Utility
Yes, Powered OFF.

Is the VM ON?
0
 
LVL 18

Author Comment

by:Andrew Davis
Comment Utility
Everything looked fine with no errors.
It did report that it needs consolidation now.
I havnt done the consolidation yet, but is not showing the delta's gone (as expected i now have another).

Cheers
0
 
LVL 18

Author Comment

by:Andrew Davis
Comment Utility
Yes, Powered OFF.

Is the VM ON?

Sorry was a stupid question on my behalf;)
0
 
LVL 117

Expert Comment

by:Andrew Hancock (VMware vExpert / EE MVE)
Comment Utility
Consolidation Message appears because it detects snapshots, but it's not intelligent to know if they are corrupted and cannot be merged or discarded.

did you do the above test?
0
 
LVL 18

Author Comment

by:Andrew Davis
Comment Utility
sorry just also noticed
(also try logging in as root, to try vmkfstools!)

root is denied direct login, i have to ssh with another user and then su to root, as i dont have physical access to this server.

Cheers
Andrew
0
 
LVL 18

Author Comment

by:Andrew Davis
Comment Utility
Yes i have done the above do you want me to try to restart it?

Sorry, as you obviously know more than me, i am not presuming anything ;)

Cheers
Andrew
0
 
LVL 117

Expert Comment

by:Andrew Hancock (VMware vExpert / EE MVE)
Comment Utility
can you get a new listing of the folder for me? (with sizes)

ls - al
0
Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

 
LVL 18

Author Comment

by:Andrew Davis
Comment Utility
listing
/vmfs/volumes/4ff54135-0181f04a-d355-00215e6eae71/2008Leap # ls -al
drwxr-xr-x    1 root     root               5740 Dec 31 02:23 .
drwxr-xr-t    1 root     root               1400 Dec 30 07:13 ..
-rw-------    1 root     root        30098534400 Dec 30 05:43 2008Leap-000001-delta.vmdk
-rw-------    1 root     root                320 Dec 30 05:41 2008Leap-000001.vmdk
-rw-------    1 root     root           16986112 Dec 31 01:01 2008Leap-000002-delta.vmdk
-rw-------    1 root     root                327 Dec 31 00:59 2008Leap-000002.vmdk
-rw-------    1 root     root             208896 Dec 31 02:22 2008Leap-000003-delta.vmdk
-rw-------    1 root     root                327 Dec 31 02:22 2008Leap-000003.vmdk
-rw-r--r--    1 root     root                 13 Dec 31 02:23 2008Leap-aux.xml
-rw-------    1 root     root       107374182400 Dec 31 02:23 2008Leap-flat.vmdk
-rw-------    1 root     root               8684 Dec 31 01:00 2008Leap.nvram
-rw-------    1 root     root                523 Dec 31 02:23 2008Leap.vmdk
-rw-r--r--    1 root     root                 77 Dec 31 02:23 2008Leap.vmsd
-rwxr-xr-x    1 root     root               3573 Dec 31 02:22 2008Leap.vmx
-rw-r--r--    1 root     root                263 Dec 30 08:02 2008Leap.vmxf
-rw-------    1 root     root        43923165184 Dec 30 05:41 2008Leap_1-000001-delta.vmdk
-rw-------    1 root     root                324 Mar 22  2013 2008Leap_1-000001.vmdk
-rw-------    1 root     root           17190912 Dec 30 05:46 2008Leap_1-000002-delta.vmdk
-rw-------    1 root     root                331 Dec 30 05:45 2008Leap_1-000002.vmdk
-rw-------    1 root     root             413696 Dec 31 02:22 2008Leap_1-000003-delta.vmdk
-rw-------    1 root     root                331 Dec 31 02:22 2008Leap_1-000003.vmdk
-rw-------    1 root     root       214748364800 Nov 11  2012 2008Leap_1-flat.vmdk
-rw-------    1 root     root                525 Jul  8  2012 2008Leap_1.vmdk
-rw-r--r--    1 root     root            5416529 Dec 30 05:46 vmmcores-2.gz
-rw-r--r--    1 root     root            4948934 Dec 30 05:54 vmmcores-3.gz
-rw-r--r--    1 root     root            5739430 Dec 30 06:53 vmmcores-4.gz
-rw-r--r--    1 root     root            5664529 Dec 30 07:03 vmmcores-5.gz
-rw-r--r--    1 root     root            5828171 Dec 30 23:59 vmmcores-6.gz
-rw-r--r--    1 root     root            5686604 Dec 31 00:10 vmmcores-7.gz
-rw-r--r--    1 root     root            5733245 Dec 31 01:01 vmmcores.gz
-rw-r--r--    1 root     root             164211 Dec 30 05:46 vmware-12.log
-rw-r--r--    1 root     root             164471 Dec 30 05:55 vmware-13.log
-rw-r--r--    1 root     root             158944 Dec 30 06:53 vmware-14.log
-rw-r--r--    1 root     root             157904 Dec 30 07:03 vmware-15.log
-rw-r--r--    1 root     root             163543 Dec 30 23:59 vmware-16.log
-rw-r--r--    1 root     root             164108 Dec 31 00:10 vmware-17.log
-rw-r--r--    1 root     root             164443 Dec 31 01:01 vmware.log
-rw-------    1 root     root           52428800 Jul  8  2012 vmx-2008Leap-3315273081-2.vswp
-r--------    1 root     root            5042176 Dec 30 23:59 vmx-zdump.001
-r--------    1 root     root            4980736 Dec 31 00:10 vmx-zdump.002
-r--------    1 root     root            5001216 Dec 31 01:01 vmx-zdump.003
/vmfs/volumes/4ff54135-0181f04a-d355-00215e6eae71/2008Leap #
0
 
LVL 117

Expert Comment

by:Andrew Hancock (VMware vExpert / EE MVE)
Comment Utility
did you delete the snapshot?

because it's created the third.....

then select DELETE ALL!
0
 
LVL 18

Author Comment

by:Andrew Davis
Comment Utility
if screen shot is easier
vm1.JPG
0
 
LVL 18

Author Comment

by:Andrew Davis
Comment Utility
did you delete the snapshot?

Sure did. as per screen shot. I didnt do the consolidate.
vm2.JPG
0
 
LVL 117

Expert Comment

by:Andrew Hancock (VMware vExpert / EE MVE)
Comment Utility
Selecting DELETE ALL should have "removed all the snapshots", when used Take Snapshot, did it appear in the list?

can you also confirm, if you look at the VM Settings, Right Click Edit VM, check the disks, you'll see if the VM is running on the disks....because it will be using 00003 etc

can you confirm?

also what size are these disks?
0
 
LVL 18

Author Comment

by:Andrew Davis
Comment Utility
Yes everything appeared correctly. could see the snapshots listed in the manager, then they were gone (after delete)

vm3.jpg shows 388.55 Gig free of 838Gig drive.

vm4.jpg shows settings. it reports that it is running from 000003.vmdk snapshot.

Cheers
Andrew
vm3.JPG
vm4.JPG
0
 
LVL 117

Expert Comment

by:Andrew Hancock (VMware vExpert / EE MVE)
Comment Utility
Can you power on the VM now?
0
 
LVL 18

Author Comment

by:Andrew Davis
Comment Utility
As expected. same problem. see screen shot.

Redo log corrupted.


Cheers
Andrew

In preparation i have started moving the backup recovery from the NAS to the datastore2 ;)
vm5.JPG
0
 
LVL 18

Author Comment

by:Andrew Davis
Comment Utility
If it matters (i dont think it does but hey i could be wrong), when monitoing the console, it does show the usual 2008 R2 startup screens, to the point of "loading windows" then dies.

Cheers
0
 
LVL 117

Expert Comment

by:Andrew Hancock (VMware vExpert / EE MVE)
Comment Utility
how very odd, it usually will NOT add another snapshot if the chain is incorrect or corrupt.

try at the console

vim-cmd vmsvc/snapshot.removeall 2 (this is a consolidation task!)

this will try and remove and merge all the snapshots, but if there is corruption it will fail.
0
 
LVL 18

Author Comment

by:Andrew Davis
Comment Utility
after the error above, when i do a "Power off" of the VM it hangs at 95% then comes up with,
The attempted operation cannot be performed in the current state (Powered off).

Cheers
vm6.JPG
0
 
LVL 18

Author Comment

by:Andrew Davis
Comment Utility
try at the console

vim-cmd vmsvc/snapshot.removeall 2 (this is a consolidation task!)
it looks like it asks a question but doesnt pause for an answer.

if i then do a ls -all nothing has changed.

/vmfs/volumes/4ff54135-0181f04a-d355-00215e6eae71/2008Leap # vim-cmd vmsvc/snapshot.removeall 2
Remove All Snapshots:
/vmfs/volumes/4ff54135-0181f04a-d355-00215e6eae71/2008Leap #

Open in new window

0
 
LVL 117

Accepted Solution

by:
Andrew Hancock (VMware vExpert / EE MVE) earned 500 total points
Comment Utility
Restore your good VM, check after restore, and delete the corrupt version.

If these snapshots on this VM, were not intentional, and you didn't know about them, keep an observation daily, to check VMs are not running on snapshot disks.
0
 
LVL 18

Author Comment

by:Andrew Davis
Comment Utility
hehe....

Kinda thought that was coming at some point ;)

on a side note is there any great benefit in upgrading my version of ESXi 5.0
If so do you have a good guide on how to?

Cheers
Andrew
0
 
LVL 117

Expert Comment

by:Andrew Hancock (VMware vExpert / EE MVE)
Comment Utility
You should really be on the latest version of 5.0 U3, for security reasons. e.g. 5.0 - 5.0u3, as for whether you should be on 5.1 or 5.5.

it's done similar to this:-

HOW TO: Upgrade from VMware vSphere Hypervisor ESXi 5.1 to VMware vSphere Hypervisor ESXi 5.5 for FREE

depends on your requirements....

HOW TO: What's New in VMware vSphere Hypervisor 5.5 (ESXi 5.5)

All the Best, Happy New Year, must get some Zzzzzzs now.
0
 
LVL 18

Author Comment

by:Andrew Davis
Comment Utility
No Problem, will have a read.

Thanks for all your help, now go get some sleep ;)

Hope you had a merry Christmas, and have a Happy New Year.

Cheers
Andrew
Australia.
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

It Is not possible to enable LLDP in vSwitch(at least is not supported by VMware), so in this article we will enable this, and also go trough how to enabled CDP and how to get this information in vSwitches and also in vDS.
David Varnum recently wrote up his impressions of PRTG, based on a presentation by my colleague Christian at Tech Field Day at VMworld in Barcelona. Thanks David, for your detailed and honest evaluation!
Teach the user how to delpoy the vCenter Server Appliance and how to configure its network settings Deploy OVF: Open VM console and configure networking:
Teach the user how to use create log bundles for vCenter Server or ESXi hosts Open vSphere Web Client: Generate vCenter Server and ESXi host log bundle:  Open vCenter Server Appliance Web Management interface and generate log bundle: Open vCenter Se…

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

9 Experts available now in Live!

Get 1:1 Help Now