Redo Log is corrupted. Assistance requested (ESXI 5.5)

philjans
philjans used Ask the Experts™
on
Hi Team,

We've been having issue this past few days with one of our VMs used as fileshare. When there's tons of I/O intensive task like backups and deleting stuff on the fileshare. The VM crash and we get the following error:Error as seen in Vcenter
The hqdprv-file03-000001.vmdk seems to be causing the issue.

Here's the datastore for the VM:Datastore
Is there any outlers? If you need any more information, please let me know. I would like some pointer on how to rectify that situation

Thank you for your time.
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
VMware and Virtualization Consultant
Fellow 2018
Expert of the Year 2017
Commented:
WHY is your VM running on a snapshot ?

HOW TO: VMware Snapshots :- Be Patient

Please read my EE Article above.

This is your issue, Virtual Machines are NOT designed to RUN on a snapshot for ever!

I'm afraid I've got some very bad news, the snapshot file 000001.vmdk is corrupted (this happens if you leave a VM running on it for too long), it will need to be discarded, and that means ALL the changes in the snapshot will be lost which could result in a corrupted disk, e.g. VM.

But you have two disks attached to this VM....

Disk1- hqdprv-file03.vmdk
Disk2 - hqdprc-file03_1.vmdk

Disk2 may have all the data for the file share, and Disk 1 could just be the OS ?

I can work with you and we can see what recovery we can do....but you *MUST* follow my instructions and do not deviate.

Do you have a virtual machine backup ?

What storage space is free available on this datastore ?

Did you know you had a snapshot ?

Author

Commented:
Hi Andrew,
Thanks for the quick response.
I have no snapshot on this machine....!
My veeam backup do take some every night at 1800 but it doesn't stay on
C:\Trash\file03.pngfile03.png>
Could it be related to veeam?
veeamfile03.pngYes i have veeam backups of this file server.
the data store has:
file03-datastore.png

Author

Commented:
Heres the HDs
file03hd.png
Acronis in Gartner 2019 MQ for datacenter backup

It is an honor to be featured in Gartner 2019 Magic Quadrant for Datacenter Backup and Recovery Solutions. Gartner’s MQ sets a high standard and earning a place on their grid is a great affirmation that Acronis is delivering on our mission to protect all data, apps, and systems.

Andrew Hancock (VMware vExpert / EE Fellow)VMware and Virtualization Consultant
Fellow 2018
Expert of the Year 2017

Commented:
The screenshot of the folder confirms a snapshot.

The file 000001.vmdk is a snapshot.

The recent disk properties also confirm a snapshot.

I would try the following before we proceed...

1. Does the machine now fail to start ? e.g. power on ?

Are those screenshots of the virtual machine NOW?

and the VM does not power on ?

Author

Commented:
The snapshot we see seems to date yesterday 1800 when it probably first freeze.
file03RVtools.png

Author

Commented:
The machine starts...
This morning it started and lasted till 15h30. Not it seems off again so it lasted a lot less longer
Andrew Hancock (VMware vExpert / EE Fellow)VMware and Virtualization Consultant
Fellow 2018
Expert of the Year 2017

Commented:
Are those screenshots above current ? (now in the present).

Does the VM refuse to power on with that error message ?

Author

Commented:
if you look at my Veeam print screen up here: I should maybe stop those job before they run at 18h00 (20 minutes from now)
Andrew Hancock (VMware vExpert / EE Fellow)VMware and Virtualization Consultant
Fellow 2018
Expert of the Year 2017

Commented:
Yes, disable Veeam do not run the backup.
Andrew Hancock (VMware vExpert / EE Fellow)VMware and Virtualization Consultant
Fellow 2018
Expert of the Year 2017

Commented:
is the file still 00001 for both disks ?

Author

Commented:
they are CURRENT.
The VM started twice today so I guess if I turn it on NOW it will work again
Andrew Hancock (VMware vExpert / EE Fellow)VMware and Virtualization Consultant
Fellow 2018
Expert of the Year 2017

Commented:
Okay so your VM has a snapshot which we need to deal with and get the VM off the snapshot.

Check the Veeam VM does NOT have the Parent VM disk attached, if it does, select Remove (but not delete from disk)

You may have to do this for both disks.

We can try the following:-

0. Turn off the VM.
1, Take a new snapshot.
2. Wait 120 seconds.
3. Select DELETE ALL.
4. Wait and Be Patient for the Snapshot to be Merged to the Parent and Deleted.

(this could take seconds, minutes, hours, or days)

Author

Commented:
the screen shot for the boot is upthere but here it is again. (boot is 80gb)
file03hd.pngHere is the screen shot for the data drive 3tb
file03datahd.png

Author

Commented:
this I don't understand:
"Check the Veeam VM does NOT have the Parent VM disk attached, if it does, select Remove (but not delete from disk)"
Andrew Hancock (VMware vExpert / EE Fellow)VMware and Virtualization Consultant
Fellow 2018
Expert of the Year 2017

Commented:
Is Veeam installed on a VM?

Author

Commented:
The backups jobs for this server start in 11 minutes: what I do?
file03-veeam-copyjob.pngfile03-veeam2.png

Author

Commented:
Veeam is on a physical server. with his own harddrives
Andrew Hancock (VMware vExpert / EE Fellow)VMware and Virtualization Consultant
Fellow 2018
Expert of the Year 2017

Commented:
Pause that JOB, it's not going to help us...ignore Veeam instructions above.... continue with snapshot....instructions.

but disable job.

Or cancel it....has it already started....

Author

Commented:
the copy job, like you see in print screen  above, still running but now it is not.
The backup job was not running.
Both jobs are disable now.
Herefile03-veeam-disabled.png
But still see veeam snapshot
file03-veeam-disabled-rvtool.png
Andrew Hancock (VMware vExpert / EE Fellow)VMware and Virtualization Consultant
Fellow 2018
Expert of the Year 2017

Commented:
okay....so follow instructions above and we will see if we can merge the snapshot

Author

Commented:
Before I proceed.
I do backup files on a sort of robocopy to an external HD which only checks the files that have change.
Should I boot the server, run this to get the changes, then stop it and do your procedure or is it overkill or could be done if the procedure doesn't work?
Andrew Hancock (VMware vExpert / EE Fellow)VMware and Virtualization Consultant
Fellow 2018
Expert of the Year 2017

Commented:
Okay, backup before you proceed if the VM will stay up before crashing...

Author

Commented:
In the comment section under your link there another linke where you say  that a 700gb will take a long long time... I have a 3.1 tb here...??
Also: if Veeam created the issue. Should I look at "cloning it" like you mentionned?
I am not sure I can find space in another datastore... I'll have to reseach that
https://www.experts-exchange.com/questions/29084305/Virtual-machine-disks-consolidation-is-needed-vSphere-6-5-multiple-datastores.html
Andrew Hancock (VMware vExpert / EE Fellow)VMware and Virtualization Consultant
Fellow 2018
Expert of the Year 2017

Commented:
It's not the size of the disk, it's the size of your snapshot, and your snapshots are small.

This is a test.....if the snapshot is corrupted, the snapshot merge process will fail...

so it will either merge and delete problem solved, or fail, if it's fails, we will have to follow a different fix.

Veeam always creates this issue, you need to check after every backup, that no snapshots are left, and apply this procedure we are going to do...now.

Author

Commented:
gotcha!
I am working on cleaning up data on that server because 3.1tb is way too much, I will probably finish it to 1tb. (in the future) that makes everything so much easier.
The backups runs everyday, so that could be a problem but with RVTool it is soooo much easier to see all my vm snapshot issue in one shot. I'll do that every morning then.
Andrew Hancock (VMware vExpert / EE Fellow)VMware and Virtualization Consultant
Fellow 2018
Expert of the Year 2017

Commented:
Don't do that it will increase the size of the snapshot.....the longer you leave the snapshot issue the more difficult it could be to solve this solution with an easy fix.

and it is not going to make any difference to the 3TB size on the disk!

but if it all goes wrong and you cannot merge, you'll probably have to complete a restore or another method.

Author

Commented:
yes of course: the clean up will only be after everything is fixed.
You said follow my instructions with an "s" but do I understand clearly that I just need to select the snapshot that show "veeam do not delete" and click on delete and that's it?
file03instructions.png
I need to go to the office to proceed to the file level backup and then do the delete snapshot so it will take about 2 hours..
I'll post it how it went then
Andrew Hancock (VMware vExpert / EE Fellow)VMware and Virtualization Consultant
Fellow 2018
Expert of the Year 2017

Commented:
No....you do the following....

1. Power off the VM.
2. Take a New Snapshot. (this ensures the snapshots are linked in the chain)
3. Wait 120 seconds. (this settles the file system and i/o
4. Select DELETE ALL. (this will then start to merge and delete all the snapshots)

It could fail with an error! if there is a fault in the chain.

(ignore what Veeam states, it's supposed to remove the snapshot after the back and it failed to do this..)

Author

Commented:
By the time I finish your procedure, you will probably be offline so in case it fails, what next steps I would have to take?
Andrew Hancock (VMware vExpert / EE Fellow)VMware and Virtualization Consultant
Fellow 2018
Expert of the Year 2017

Commented:
did it work ?

when you Take the first snapshot or delete - did it fail ?

Author

Commented:
Just arrived at the office... far from home.
So far, I am just in the process of doing my differencial robocopy backup... will take hours. Systeme seems slow.
That's why I was asking, if it fails, what would be my next steps since you will probably be offline.
When I am finish, I will do:
1. Power off the VM.
2. Take a New Snapshot. (this ensures the snapshots are linked in the chain)
3. Wait 120 seconds. (this settles the file system and i/o
4. Select DELETE ALL. (this will then start to merge and delete all the snapshots)

And let you know

Author

Commented:
Seems to have worked :)
I see no snapshot anymore in snapshot Manager.
I still see a lot of files here, can you tell if you see anything abnormal?
file03-final-file-listing.png
It would usually freeze after a couple of hours so if 24 hours past: should be fixed.
Andrew Hancock (VMware vExpert / EE Fellow)VMware and Virtualization Consultant
Fellow 2018
Expert of the Year 2017

Commented:
No Snapshots that I can see.

The *.gz files are crash dumps and can be deleted if you like.

and then the files *.002, *.003 are also dump files.

Regularly check after Veeam Backups,....for snapshots and if you find them, complete the procedure you know.

Author

Commented:
Thank you very much Andrew. Great information, great help!
Andrew Hancock (VMware vExpert / EE Fellow)VMware and Virtualization Consultant
Fellow 2018
Expert of the Year 2017

Commented:
no problems.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial