Link to home
Start Free TrialLog in
Avatar of philjans
philjansFlag for Canada

asked on

Redo Log is corrupted. Assistance requested (ESXI 5.5)

Hi Team,

We've been having issue this past few days with one of our VMs used as fileshare. When there's tons of I/O intensive task like backups and deleting stuff on the fileshare. The VM crash and we get the following error:User generated image
The hqdprv-file03-000001.vmdk seems to be causing the issue.

Here's the datastore for the VM:User generated image
Is there any outlers? If you need any more information, please let me know. I would like some pointer on how to rectify that situation

Thank you for your time.
ASKER CERTIFIED SOLUTION
Avatar of Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of philjans

ASKER

Hi Andrew,
Thanks for the quick response.
I have no snapshot on this machine....!
My veeam backup do take some every night at 1800 but it doesn't stay on
C:\Trash\file03.pngUser generated image>
Could it be related to veeam?
User generated imageYes i have veeam backups of this file server.
the data store has:
User generated image
Heres the HDs
User generated image
The screenshot of the folder confirms a snapshot.

The file 000001.vmdk is a snapshot.

The recent disk properties also confirm a snapshot.

I would try the following before we proceed...

1. Does the machine now fail to start ? e.g. power on ?

Are those screenshots of the virtual machine NOW?

and the VM does not power on ?
The snapshot we see seems to date yesterday 1800 when it probably first freeze.
User generated image
The machine starts...
This morning it started and lasted till 15h30. Not it seems off again so it lasted a lot less longer
Are those screenshots above current ? (now in the present).

Does the VM refuse to power on with that error message ?
if you look at my Veeam print screen up here: I should maybe stop those job before they run at 18h00 (20 minutes from now)
they are CURRENT.
The VM started twice today so I guess if I turn it on NOW it will work again
Okay so your VM has a snapshot which we need to deal with and get the VM off the snapshot.

Check the Veeam VM does NOT have the Parent VM disk attached, if it does, select Remove (but not delete from disk)

You may have to do this for both disks.

We can try the following:-

0. Turn off the VM.
1, Take a new snapshot.
2. Wait 120 seconds.
3. Select DELETE ALL.
4. Wait and Be Patient for the Snapshot to be Merged to the Parent and Deleted.

(this could take seconds, minutes, hours, or days)
the screen shot for the boot is upthere but here it is again. (boot is 80gb)
User generated imageHere is the screen shot for the data drive 3tb
User generated image
this I don't understand:
"Check the Veeam VM does NOT have the Parent VM disk attached, if it does, select Remove (but not delete from disk)"
The backups jobs for this server start in 11 minutes: what I do?
User generated imageUser generated image
Veeam is on a physical server. with his own harddrives
Pause that JOB, it's not going to help us...ignore Veeam instructions above.... continue with snapshot....instructions.

but disable job.

Or cancel it....has it already started....
the copy job, like you see in print screen  above, still running but now it is not.
The backup job was not running.
Both jobs are disable now.
HereUser generated image
But still see veeam snapshot
User generated image
okay....so follow instructions above and we will see if we can merge the snapshot
Before I proceed.
I do backup files on a sort of robocopy to an external HD which only checks the files that have change.
Should I boot the server, run this to get the changes, then stop it and do your procedure or is it overkill or could be done if the procedure doesn't work?
Okay, backup before you proceed if the VM will stay up before crashing...
In the comment section under your link there another linke where you say  that a 700gb will take a long long time... I have a 3.1 tb here...??
Also: if Veeam created the issue. Should I look at "cloning it" like you mentionned?
I am not sure I can find space in another datastore... I'll have to reseach that
https://www.experts-exchange.com/questions/29084305/Virtual-machine-disks-consolidation-is-needed-vSphere-6-5-multiple-datastores.html
It's not the size of the disk, it's the size of your snapshot, and your snapshots are small.

This is a test.....if the snapshot is corrupted, the snapshot merge process will fail...

so it will either merge and delete problem solved, or fail, if it's fails, we will have to follow a different fix.

Veeam always creates this issue, you need to check after every backup, that no snapshots are left, and apply this procedure we are going to do...now.
gotcha!
I am working on cleaning up data on that server because 3.1tb is way too much, I will probably finish it to 1tb. (in the future) that makes everything so much easier.
The backups runs everyday, so that could be a problem but with RVTool it is soooo much easier to see all my vm snapshot issue in one shot. I'll do that every morning then.
Don't do that it will increase the size of the snapshot.....the longer you leave the snapshot issue the more difficult it could be to solve this solution with an easy fix.

and it is not going to make any difference to the 3TB size on the disk!

but if it all goes wrong and you cannot merge, you'll probably have to complete a restore or another method.
yes of course: the clean up will only be after everything is fixed.
You said follow my instructions with an "s" but do I understand clearly that I just need to select the snapshot that show "veeam do not delete" and click on delete and that's it?
User generated image
I need to go to the office to proceed to the file level backup and then do the delete snapshot so it will take about 2 hours..
I'll post it how it went then
No....you do the following....

1. Power off the VM.
2. Take a New Snapshot. (this ensures the snapshots are linked in the chain)
3. Wait 120 seconds. (this settles the file system and i/o
4. Select DELETE ALL. (this will then start to merge and delete all the snapshots)

It could fail with an error! if there is a fault in the chain.

(ignore what Veeam states, it's supposed to remove the snapshot after the back and it failed to do this..)
By the time I finish your procedure, you will probably be offline so in case it fails, what next steps I would have to take?
did it work ?

when you Take the first snapshot or delete - did it fail ?
Just arrived at the office... far from home.
So far, I am just in the process of doing my differencial robocopy backup... will take hours. Systeme seems slow.
That's why I was asking, if it fails, what would be my next steps since you will probably be offline.
When I am finish, I will do:
1. Power off the VM.
2. Take a New Snapshot. (this ensures the snapshots are linked in the chain)
3. Wait 120 seconds. (this settles the file system and i/o
4. Select DELETE ALL. (this will then start to merge and delete all the snapshots)

And let you know
Seems to have worked :)
I see no snapshot anymore in snapshot Manager.
I still see a lot of files here, can you tell if you see anything abnormal?
User generated image
It would usually freeze after a couple of hours so if 24 hours past: should be fixed.
No Snapshots that I can see.

The *.gz files are crash dumps and can be deleted if you like.

and then the files *.002, *.003 are also dump files.

Regularly check after Veeam Backups,....for snapshots and if you find them, complete the procedure you know.
Thank you very much Andrew. Great information, great help!