Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

corrupt redo log. VMware ESXi

Posted on 2013-12-30
31
Medium Priority
?
8,528 Views
Last Modified: 2013-12-30
Host: VMware ESXi, 5.0.0, 469512
Client shows error while booting up:
The redo log of {Servername}-000002.vmdk is corrupted. Power off the virtual machine, if the problem still persists, discard the redo log.

Actions:
Just about every article i have found regarding this says to consolidate the snapshots, so i deleted the snapshots (vSphere Client - Snapshot manager), which then told me that it wanted to consolidate the drives. When i tell it to consolidate i get
Consolidate virtual machine disk files
A general system error occurred: Input/output error

Following http://dunfraggin.blogspot.com.au/2012/09/virtual-machine-disk-consolidation.html i susspect there may b e locked files. The server reports that it is powered Off, but did come up with an error during the Power Off task.

 The article above leads me to http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&docTypeID=DT_KB_1_1&externalId=10051
if i look at the  vmware.log file i am not seeing anything unexpected at the end of the file.
however if i try to use vmkfstools i get the following

login as: itsupport
Using keyboard-interactive authentication.
Password:
The time and date of this login have been sent to the system logs.

VMware offers supported, powerful system administration tools.  Please
see www.vmware.com/go/sysadmintools for details.

The ESXi Shell can be disabled by an administrative user. See the
vSphere Security documentation for more information.
~ $ su
Password:
~ # vmkfstools
sh: vmkfstools: not found
~ #

So not really sure where to go from here ;)

Additional notes:
If this helps at all (Vmid 2 is the one in question)

~ # vim-cmd vmsvc/getallvms
Vmid      Name                     File                        Guest OS          Version   Annotation
1      SBS2011      [datastore1] SBS2011/SBS2011.vmx     windows7Server64Guest   vmx-08
2      2008Leap     [datastore2] 2008Leap/2008Leap.vmx   windows7Server64Guest   vmx-08
3      2008Leap-2   [NAS01] 2008Leap-2/2008Leap-2.vmx    windows7Server64Guest   vmx-08
~ # vim-cmd vmsvc/power.getstate 2
Retrieved runtime info
Powered off
~ #
There is only the one host, so no other host could possibly have a lock.
Have rebooted the host:- no change.

Cheers
Andrew
0
Comment
Question by:Andrew Davis
  • 18
  • 13
31 Comments
 
LVL 124
ID: 39747397
The redo log to which the error refers is the 2nd snapshot delta disk, snapshot deltas do become corrupted, and when the chain is corrupted, the VM will not start.

e.g. parent disk (vmdk) + 1st snapshot disk + 2nd snapshot disk = VM VMDK

the chain of the above must be correct. if the host server crashed, restarted, datastore ran out of space, the 2nd snapshot disk gets corrupted. ALL the Information in the 2nd snapshot disk needs to be discarded, to get the VM started, which means data loss.

It's possible damage or corruption has already occured to the snapshot disk, and you would have to discard this delta, to start the VM, resulting in a corrupt or out dated VM.

can you get me a list of the current files in the folder, and I can work with you to see if we can get the VM started. Please do not try to fiddle, because you could cause more damage.

HOW TO: VMware Snapshots :- Be Patient

Also, can you reply, events which happened before this?

Datastore out of space?

Host Server restart?

Host Server crash?

(I'm on GMT UK Time, so just about to get some Zzzzzs!). But I'll hang in here for 30 mins.

(also try logging in as root, to try vmkfstools!)
0
 
LVL 19

Author Comment

by:Andrew Davis
ID: 39747403
Thanks for your time ;)
Datastore out of space? NO 50% head room.
Host Server restart? Kinda. There was a power outage around the time of the crash. UPS should have seen to a clean shutdown.
Files in directory:-
# cd /vmfs/volumes/datastore2/2008Leap
/vmfs/volumes/4ff54135-0181f04a-d355-00215e6eae71/2008Leap # ls
2008Leap-000001-delta.vmdk      2008Leap-flat.vmdk              2008Leap.vmxf                   2008Leap_1-flat.vmdk            vmmcores-5.gz                   vmware-13.log                   vmware.log
2008Leap-000001.vmdk            2008Leap.nvram                  2008Leap_1-000001-delta.vmdk    2008Leap_1.vmdk                 vmmcores-6.gz                   vmware-14.log                   vmx-2008Leap-3315273081-2.vswp
2008Leap-000002-delta.vmdk      2008Leap.vmdk                   2008Leap_1-000001.vmdk          vmmcores-2.gz                   vmmcores-7.gz                   vmware-15.log                   vmx-zdump.001
2008Leap-000002.vmdk            2008Leap.vmsd                   2008Leap_1-000002-delta.vmdk    vmmcores-3.gz                   vmmcores.gz                     vmware-16.log                   vmx-zdump.002
2008Leap-aux.xml                2008Leap.vmx                    2008Leap_1-000002.vmdk          vmmcores-4.gz                   vmware-12.log                   vmware-17.log                   vmx-zdump.003
/vmfs/volumes/4ff54135-0181f04a-d355-00215e6eae71/2008Leap #
Note "2008Leap.vmx" is green in colour.
Cheers
0
 
LVL 124
ID: 39747406
okay, this is the reason for failure...

The redo log to which the error refers is the 2nd snapshot delta disk, snapshot deltas do become corrupted, and when the chain is corrupted, the VM will not start.

e.g. parent disk (vmdk) + 1st snapshot disk + 2nd snapshot disk = VM VMDK

the chain of the above must be correct. if the host server crashed, restarted, datastore ran out of space, the 2nd snapshot disk gets corrupted. ALL the Information in the 2nd snapshot disk needs to be discarded, to get the VM started, which means data loss.
0
Veeam and MySQL: How to Perform Backup & Recovery

MySQL and the MariaDB variant are among the most used databases in Linux environments, and many critical applications support their data on them. Watch this recorded webinar to find out how Veeam Backup & Replication allows you to get consistent backups of MySQL databases.

 
LVL 19

Author Comment

by:Andrew Davis
ID: 39747409
In preperation for wost case, i have also:-

1. Copied the servers directory to another datastore (in case i need to get anything back).
2. Created a new VM (2008Leap-2) and done a successfull restore to a NFS share on NAS01. I have not connected it to the network as yet, as i just wanted to prove the disaster recovery, and cannot leave it sitting on the NAS (tooooo slow).

(Just saw another post on this question come in so this comment will be out of order)

Cheers
0
 
LVL 19

Author Comment

by:Andrew Davis
ID: 39747414
So go with disaster recovery?

I have shadow protect backup that will have all data, as office was closed (Thankyou holiday period).


Cheers
Andrew
0
 
LVL 124
ID: 39747420
okay, so this is a two disk virtual machine....with disks...

2008Leap-flat.vmdk      
2008Leap-000001.vmdk  
2008Leap-000001-delta.vmdk
2008Leap-000002-delta.vmdk

2008Leap_1-flat.vmdk  
2008Leap_1-000001-delta.vmdk
2008Leap_1-000002-delta.vmdk

both disks have two snapshots.

Can you try the following:-

1. Take a Snapshot of the current VM
2. Wait 60 seconds
3. Delete the Snapshot

report any errors, and then check the disk if snapshots have gone. (the delta files!)
0
 
LVL 19

Author Comment

by:Andrew Davis
ID: 39747423
With it powered off?
0
 
LVL 124
ID: 39747425
I'm afraid this file

{Servername}-000002.vmdk

is corrupt, and would have to be excluded from the VM configuration to get the VM started, depending on how long the machine was running on this snapshot disk, e.g. 12 days, the VM would be 12 days old.

If you have Backups, time to restore, and monitor those snapshots daily...and check your VMs are not running on snapshots.
0
 
LVL 124
ID: 39747426
Yes, Powered OFF.

Is the VM ON?
0
 
LVL 19

Author Comment

by:Andrew Davis
ID: 39747428
Everything looked fine with no errors.
It did report that it needs consolidation now.
I havnt done the consolidation yet, but is not showing the delta's gone (as expected i now have another).

Cheers
0
 
LVL 19

Author Comment

by:Andrew Davis
ID: 39747429
Yes, Powered OFF.

Is the VM ON?

Sorry was a stupid question on my behalf;)
0
 
LVL 124
ID: 39747434
Consolidation Message appears because it detects snapshots, but it's not intelligent to know if they are corrupted and cannot be merged or discarded.

did you do the above test?
0
 
LVL 19

Author Comment

by:Andrew Davis
ID: 39747435
sorry just also noticed
(also try logging in as root, to try vmkfstools!)

root is denied direct login, i have to ssh with another user and then su to root, as i dont have physical access to this server.

Cheers
Andrew
0
 
LVL 19

Author Comment

by:Andrew Davis
ID: 39747436
Yes i have done the above do you want me to try to restart it?

Sorry, as you obviously know more than me, i am not presuming anything ;)

Cheers
Andrew
0
 
LVL 124
ID: 39747438
can you get a new listing of the folder for me? (with sizes)

ls - al
0
 
LVL 19

Author Comment

by:Andrew Davis
ID: 39747439
listing
/vmfs/volumes/4ff54135-0181f04a-d355-00215e6eae71/2008Leap # ls -al
drwxr-xr-x    1 root     root               5740 Dec 31 02:23 .
drwxr-xr-t    1 root     root               1400 Dec 30 07:13 ..
-rw-------    1 root     root        30098534400 Dec 30 05:43 2008Leap-000001-delta.vmdk
-rw-------    1 root     root                320 Dec 30 05:41 2008Leap-000001.vmdk
-rw-------    1 root     root           16986112 Dec 31 01:01 2008Leap-000002-delta.vmdk
-rw-------    1 root     root                327 Dec 31 00:59 2008Leap-000002.vmdk
-rw-------    1 root     root             208896 Dec 31 02:22 2008Leap-000003-delta.vmdk
-rw-------    1 root     root                327 Dec 31 02:22 2008Leap-000003.vmdk
-rw-r--r--    1 root     root                 13 Dec 31 02:23 2008Leap-aux.xml
-rw-------    1 root     root       107374182400 Dec 31 02:23 2008Leap-flat.vmdk
-rw-------    1 root     root               8684 Dec 31 01:00 2008Leap.nvram
-rw-------    1 root     root                523 Dec 31 02:23 2008Leap.vmdk
-rw-r--r--    1 root     root                 77 Dec 31 02:23 2008Leap.vmsd
-rwxr-xr-x    1 root     root               3573 Dec 31 02:22 2008Leap.vmx
-rw-r--r--    1 root     root                263 Dec 30 08:02 2008Leap.vmxf
-rw-------    1 root     root        43923165184 Dec 30 05:41 2008Leap_1-000001-delta.vmdk
-rw-------    1 root     root                324 Mar 22  2013 2008Leap_1-000001.vmdk
-rw-------    1 root     root           17190912 Dec 30 05:46 2008Leap_1-000002-delta.vmdk
-rw-------    1 root     root                331 Dec 30 05:45 2008Leap_1-000002.vmdk
-rw-------    1 root     root             413696 Dec 31 02:22 2008Leap_1-000003-delta.vmdk
-rw-------    1 root     root                331 Dec 31 02:22 2008Leap_1-000003.vmdk
-rw-------    1 root     root       214748364800 Nov 11  2012 2008Leap_1-flat.vmdk
-rw-------    1 root     root                525 Jul  8  2012 2008Leap_1.vmdk
-rw-r--r--    1 root     root            5416529 Dec 30 05:46 vmmcores-2.gz
-rw-r--r--    1 root     root            4948934 Dec 30 05:54 vmmcores-3.gz
-rw-r--r--    1 root     root            5739430 Dec 30 06:53 vmmcores-4.gz
-rw-r--r--    1 root     root            5664529 Dec 30 07:03 vmmcores-5.gz
-rw-r--r--    1 root     root            5828171 Dec 30 23:59 vmmcores-6.gz
-rw-r--r--    1 root     root            5686604 Dec 31 00:10 vmmcores-7.gz
-rw-r--r--    1 root     root            5733245 Dec 31 01:01 vmmcores.gz
-rw-r--r--    1 root     root             164211 Dec 30 05:46 vmware-12.log
-rw-r--r--    1 root     root             164471 Dec 30 05:55 vmware-13.log
-rw-r--r--    1 root     root             158944 Dec 30 06:53 vmware-14.log
-rw-r--r--    1 root     root             157904 Dec 30 07:03 vmware-15.log
-rw-r--r--    1 root     root             163543 Dec 30 23:59 vmware-16.log
-rw-r--r--    1 root     root             164108 Dec 31 00:10 vmware-17.log
-rw-r--r--    1 root     root             164443 Dec 31 01:01 vmware.log
-rw-------    1 root     root           52428800 Jul  8  2012 vmx-2008Leap-3315273081-2.vswp
-r--------    1 root     root            5042176 Dec 30 23:59 vmx-zdump.001
-r--------    1 root     root            4980736 Dec 31 00:10 vmx-zdump.002
-r--------    1 root     root            5001216 Dec 31 01:01 vmx-zdump.003
/vmfs/volumes/4ff54135-0181f04a-d355-00215e6eae71/2008Leap #
0
 
LVL 124
ID: 39747442
did you delete the snapshot?

because it's created the third.....

then select DELETE ALL!
0
 
LVL 19

Author Comment

by:Andrew Davis
ID: 39747444
if screen shot is easier
vm1.JPG
0
 
LVL 19

Author Comment

by:Andrew Davis
ID: 39747447
did you delete the snapshot?

Sure did. as per screen shot. I didnt do the consolidate.
vm2.JPG
0
 
LVL 124
ID: 39747449
Selecting DELETE ALL should have "removed all the snapshots", when used Take Snapshot, did it appear in the list?

can you also confirm, if you look at the VM Settings, Right Click Edit VM, check the disks, you'll see if the VM is running on the disks....because it will be using 00003 etc

can you confirm?

also what size are these disks?
0
 
LVL 19

Author Comment

by:Andrew Davis
ID: 39747453
Yes everything appeared correctly. could see the snapshots listed in the manager, then they were gone (after delete)

vm3.jpg shows 388.55 Gig free of 838Gig drive.

vm4.jpg shows settings. it reports that it is running from 000003.vmdk snapshot.

Cheers
Andrew
vm3.JPG
vm4.JPG
0
 
LVL 124
ID: 39747463
Can you power on the VM now?
0
 
LVL 19

Author Comment

by:Andrew Davis
ID: 39747469
As expected. same problem. see screen shot.

Redo log corrupted.


Cheers
Andrew

In preparation i have started moving the backup recovery from the NAS to the datastore2 ;)
vm5.JPG
0
 
LVL 19

Author Comment

by:Andrew Davis
ID: 39747471
If it matters (i dont think it does but hey i could be wrong), when monitoing the console, it does show the usual 2008 R2 startup screens, to the point of "loading windows" then dies.

Cheers
0
 
LVL 124
ID: 39747473
how very odd, it usually will NOT add another snapshot if the chain is incorrect or corrupt.

try at the console

vim-cmd vmsvc/snapshot.removeall 2 (this is a consolidation task!)

this will try and remove and merge all the snapshots, but if there is corruption it will fail.
0
 
LVL 19

Author Comment

by:Andrew Davis
ID: 39747480
after the error above, when i do a "Power off" of the VM it hangs at 95% then comes up with,
The attempted operation cannot be performed in the current state (Powered off).

Cheers
vm6.JPG
0
 
LVL 19

Author Comment

by:Andrew Davis
ID: 39747483
try at the console

vim-cmd vmsvc/snapshot.removeall 2 (this is a consolidation task!)
it looks like it asks a question but doesnt pause for an answer.

if i then do a ls -all nothing has changed.

/vmfs/volumes/4ff54135-0181f04a-d355-00215e6eae71/2008Leap # vim-cmd vmsvc/snapshot.removeall 2
Remove All Snapshots:
/vmfs/volumes/4ff54135-0181f04a-d355-00215e6eae71/2008Leap #

Open in new window

0
 
LVL 124

Accepted Solution

by:
Andrew Hancock (VMware vExpert / EE MVE^2) earned 2000 total points
ID: 39747484
Restore your good VM, check after restore, and delete the corrupt version.

If these snapshots on this VM, were not intentional, and you didn't know about them, keep an observation daily, to check VMs are not running on snapshot disks.
0
 
LVL 19

Author Comment

by:Andrew Davis
ID: 39747486
hehe....

Kinda thought that was coming at some point ;)

on a side note is there any great benefit in upgrading my version of ESXi 5.0
If so do you have a good guide on how to?

Cheers
Andrew
0
 
LVL 124
ID: 39747493
You should really be on the latest version of 5.0 U3, for security reasons. e.g. 5.0 - 5.0u3, as for whether you should be on 5.1 or 5.5.

it's done similar to this:-

HOW TO: Upgrade from VMware vSphere Hypervisor ESXi 5.1 to VMware vSphere Hypervisor ESXi 5.5 for FREE

depends on your requirements....

HOW TO: What's New in VMware vSphere Hypervisor 5.5 (ESXi 5.5)

All the Best, Happy New Year, must get some Zzzzzzs now.
0
 
LVL 19

Author Comment

by:Andrew Davis
ID: 39747496
No Problem, will have a read.

Thanks for all your help, now go get some sleep ;)

Hope you had a merry Christmas, and have a Happy New Year.

Cheers
Andrew
Australia.
0

Featured Post

NEW Veeam Backup for Microsoft Office 365 1.5

With Office 365, it’s your data and your responsibility to protect it. NEW Veeam Backup for Microsoft Office 365 eliminates the risk of losing access to your Office 365 data.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

If we need to check who deleted a Virtual Machine from our vCenter. Looking this task in logs can be painful and spend lot of time, so the best way to check this is in the vCenter DB. Just connect to vCenter DB(default DB should be VCDB and using…
In this article we will learn how to backup a VMware farm using Nakivo Backup & Replication. In this tutorial we will install the software on a Windows 2012 R2 Server.
Teach the user how to join ESXi hosts to Active Directory domains Open vSphere Client: Join ESXi host to AD domain: Verify ESXi computer account in AD: Configure permissions for domain user in ESXi: Test domain user login to ESXi host:
This tutorial will walk an individual through the steps necessary to enable the VMware\Hyper-V licensed feature of Backup Exec 2012. In addition, how to add a VMware server and configure a backup job. The first step is to acquire the necessary licen…
Suggested Courses

782 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question