Link to home
Start Free TrialLog in
Avatar of loosain
loosain

asked on

esxi 6: virtual machine ran out of disc space and now i cant start the vm. It always says "vm-name.vmx canot be opened" and device bootstrap is not available

Hi,

we have a esxi 6 running. Yesterday it ran out of diskspace. In the logfile there was this:

[msg.hbacommon.outofspace] There is no more space for virtual disk Server-000002.vmdk. You might be able to continue this session by freeing disk space on the relevant volume, and clicking _Retry. Click Cancel to terminate this session.
2016-12-02T05:57:35.081Z| vmx| I120: ----------------------------------------
2016-12-02T05:57:36.686Z| mks| I120: SOCKET 17454 (184) Creating VNC remote connection.
2016-12-02T05:57:36.773Z| mks| W110: VNCENCODE 17454 failed to allocate VNCBlitDetect
2016-12-02T05:57:36.773Z| mks| W110: VNCENCODE 17454 failed to allocate VNCBackBuffer
2016-12-02T05:57:41.836Z| vmx| I120: VigorTransportProcessClientPayload: opID=37ABADE0-000000A4-a787 seq=602830: Receiving Bootstrap.MessageReply request.
2016-12-02T05:57:41.837Z| vmx| I120: Vigor_MessageRevoke: message 'msg.hbacommon.outofspace' (seq 4025) is revoked
2016-12-02T05:57:41.837Z| vmx| I120: VigorTransport_ServerSendResponse opID=37ABADE0-000000A4-a787 seq=602830: Completed Bootstrap request.
2016-12-02T05:57:41.837Z| vmx| I120: MsgQuestion: msg.hbacommon.outofspace reply=1
2016-12-02T05:57:41.837Z| vmx| I120: Exiting because of failed disk operation.

Open in new window


It seemed not to be problem with space, more it was a raid-error. So we rebuild the raid on new discs. Esxi boots without any problem. But when i try to start the vm, a message shows up: "Device bootstrap is not available".
In the event log it shows, that the server.vmx cant be opened.

What i did so far:
- renamend and copied vmx-file -> same error
- download vmx-file to local client, opened it and uploded it again ( i could read it with notepad) -> same error
- deleted the vm from inventory list and recreated it (filebrowser->rightclick -> add to invetory list) -> same error
- Create a new vm and add the old vdmk -> vm started but i get an bluescreen in the vm -> i think a snapshot-problem

Interesting:
- browsing the files be using a browser (webclient), i can see more files in datastore. there is a server.vmx.lck
- browsing there, i can see server_x-0000001.vdmk and a delta-file but no server_x.vdmk . I don´t know if this disc was used, but the date of the 000001-file is from december 2016...

Maybe the solution is to kill this vmx.lck . but i don´t know how. This is only one physical machine and no cluster (i read in the web that another machine could locked this and i have to stop it in this machine...)

I really don´t know much about esxi, so please help to get this server back running asap.

thanks for helping

loosain
ASKER CERTIFIED SOLUTION
Avatar of Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of loosain
loosain

ASKER

Thank you so much for helping... here the wanted screenshots:

User generated imageUser generated image
OKay, looks like someone has deliberately created snapshots, you can see them in the list.

Snapshots are NOT backups, and you should not keep running a VM on snasphots because they cause performance issues, and you will fill up your datastore.

So to get your VM running again, you will need to either increase space on the datastore, and delete the snapshot.

What space if any is left on the datastrore?
Avatar of loosain

ASKER

I know that snapshots are no backups. I am using hyperV, so virtualization is known to me. But i really don´t know much about vmware. I wasn´t working with this machine before...

On configuration tab on the host, it says that the datastore has 324gb free.
User generated image
Snapshots or Checkpoints as they are called in HYper-V work the same way as they do in Hyper-V, if you leave them they will fill up the datastore.

1. Power off the VM
2. Take a new snapshot.
3. Wait 60 seconds to 2 minutes
4. then  select DELETE-ALL

and wait whilst it removes ALL the snashopts and merges them.

This could be an issue what the VM process is still running, after the datastore was filled up.
Avatar of loosain

ASKER

taking a new snapshot brings me to the error, that a invalid snapshot configuration was found.

when i tried to create a new vm with the old virtual disks (the server-00002.vmdk and server_1-000002.vmdk), it says after trying to start the vm, that the father-disc (server_1.vmdk) can´t be found. Me too can´t see this file in the directory. For the second disc (server.vmdk which contains the OS) it says that there is a problem with content IDs and that the parent disc was changed after creating the son disc...

i don´t know what the hell was going on there. And what about this server-flat.vmdk ? Ist vmware creating this? in the last logs, i checked the configuration, and ther i could only find 2 discs (server.vmdk and server_1.vmdk).
Avatar of loosain

ASKER

Maybe this server-flat.vmdk ist my lost server_1.vmdk ?
okay,...

this is good....as we can try and work out what is now happening...

You will always have two files for a virtual machine...

filename.vmdk ---> this is the descriptor which contains disk geomoetry and is a text file
filename-flat.vmdk ---> this file is actually the data.

and then you have snapshot files, which end in -0000x.vmdk

you have two snapshot files 1 and 2.

looking at the files, this is a two disk virtual machine.

server.vmdk
server-flat.vmdk
server-00001.vmdk
server-00002.vmdk

the first disk looks correct. two snapshots.

for the second disk, I do not see a

server_1.vmdk
server_1-flat.vmdk

they are missing.

I do see a

server_1-00001.vmdk
server_2-00002.vmdk

Corruption has occurred and from the screenshot you are missing files, either datastore became full, because of snapshot or something else.

can you confirm, if you have

server_1.vmdk ?
server_1-flat.vmdk ?

I don't see them in the list ?

You cannot start a VM with just the delta/children disks/ you are missing the parents.

If you do not have them you will need to restore them VM.

Or try recovery.
Avatar of loosain

ASKER

you are right. Datastore became full. Server_1.vmdk and server_1-flat.vmdk are missing. Thats what i see too.
Maybe i can get them back from Backup. First step should be to get server.vmdk with the last version (00002.vmdk) running.
There i have this content-ID error, which tells me that the snapshot-father was change after the snapshot was generated.
It seems that in the vmx-file as disc server.vmdk . But if i look at the date of the files, then 00002.vmdk was used last ant 2nd of december in the morning (thats the time, when the crash happend). So i think this is our disc. I don´t know why in the vmx the old snapshot (server.vmdk) is configured. Maybe someone did change this (after nothing works they always say that they didn´t do anything ...)
So we have to get 00002.vmdk running. If the OS is back with this again, i can look, what was on this second disc (server_1-000002.vmdk) in the backup and if restore is something we can deal with without having a lost of data...

What is the best to do to get server-000002.vmdk get online with a right snapshot-chain (which we can delete after all is done and a  new backup was taken...)?
Restore the ENTIRE VM

OR...

If you have attempted to Power ON a VM disk, without the snapshot, you will get a CID mismatch.

You will need to complete this procedure, to ensure that the snapshot chain for server.vmdk is valid.

Resolving the CID mismatch error: The parent virtual disk has been modified since the child was created (1007969)

You could attempt to restore files server_1.vmdk and server_1-flat.vmdk, and then you could attempt to patch the snapshots you have with these files, by using the CID mismatch.

However,  there is a chance that you may end up with a corrupt disk.
Avatar of loosain

ASKER

Restoring the whole vm is not the way, because they made the backup out of the vm with acronis installed withing this vm... So i have to get this back to work to access the backup.

The server.vmdk is only with a correct snapshot-chain... Server is running again. But the second disc was containing all the data and exchange DB... As it seems from the logs Backup had some problems the last week, so the data of a few days could be gone...

Is there any chance to get the Files out of the server_1-000002.vmdk without having the server_1.vmdk ?
With the files from the backup (nearly 2 weeks old) and the up2date files from this vmdk it could be 100% restore.
With an active snapshot, all changes go into the snapshot file. Theoretically it should be possible to restore the original father disk files from a state the second snapshot has been active already.
so you do not have the server_1.vmdk file or any chance to recover it ?

you could try renaming server_1-000002.vmdk ---> server_1.vmdk and try adding it to a VM.

but I think without the server_1.vmdk parent file, the disk may be unreadable.
Andrew, the snapshot file will be incomplete. One cannot assume all blocks have been changed since creating the snapshot. Otherwise the snapshot would be a replacement candidate for the "father" disk, I agree.

Further, the OP seems to have a 2 weeks old server_1.vmdk, and that is a much better way to start with - restore all files having a change date older than the second snapshot file. Do you agree?
Avatar of loosain

ASKER

ok, i have found another possible way that maybe is more successfully. i have access to the the old member disc of this raid. But i cant read vmfs 5.61 in Windows10. I made a new question for this

https://www.experts-exchange.com/questions/28987190/How-to-access-a-vmfs-5-61-partition-without-a-esxi-i-have-orphaned-Raid-member-which-maybe-have-my-lost-data-from-datastore.html

Hoping with this to get all my data back. Then hopefully can post in both questions the right way of doing it for all the other people and close both...
Answered Question your other question.

@Qlemo - I think it's doubtful of recovery, and hence why backups are important, and to be checked, and why Snapshot Hell is dangerous.

Personally I would not bother unless I had the time, and was desperate.
"Desperate" fits here, I guess. Messing with old Exchange backups isn't something I would want to do, so I would put effort into getting something more recent by all means ;-).

No doubt - having a functional, valid and recent backup is king and essential. But it is not the first time noticing backups are too old, and recently not really working not before you need to recover something ...
Avatar of loosain

ASKER

Desperate fits. Acronis started 11 days ago a backup job and without any error stopped at crash time... So we didn´t get aware of something happens wrong. Before that, backup did its job fine...
At the moment i (only for security) make 1:1 clone of the old raid-member. Then i am going to start the esxi from this disc and take the now working ones out of the server. Maybe esxi starts and let me copy down the missing vmdk from its datastore. If this won´t work, i am going to give ufs-explorer a try.
Avatar of loosain

ASKER

Ufs-Explorer is now extracting the missig server_1.vmdk and server_1-flat.vmdk
It will last a few hours, because we are talking about 400GB...

But another question: To get some mor free space on the Disc i konsolidated my server and deleted all snapshots. So far so good.
But now comes in my mind. i want to insert the missing disc server_1 which now is not entered in configuration of this machine because of the missing disc files.
Can i just later add a new disc to the machine (server_1-000002.vmdk) and it will use the whole chain ? an how can i then merge those files to one good and small vmdk ?
Check the files when they are restored, the server_1.vmdk (text file descriptor) should include references to

server_1-000002.vmdk

which chains

server_1-000001.vmdk

which chains

server_1-flat.vmdk

BUT you will also need to check the CID match, as you've done before.

Once you have completed the chain, and it's correct, then you can MERGE all the snapshots into a single disk as follows:-

vmkfstools -i <most recent snapshot file name> /vmfs/volumes/<temp folder name>/<newfilename.vmdk>

Open in new window


so your command would look like this


vmkfstools -i server_1-000002.vmdk  /vmfs/volumes/new server/server_1.vmdk

Open in new window


where /vmfs/volumes/<new server> is a temp folder name

this will create a new virtual machine disk, without any snapshots, add this to your VM, and providing there is no corruption, should be good.
Avatar of loosain

ASKER

Wow Andrew,

helped me a lot so far. Thank you so much. I am really confident that everything will work this night.
Does this merging need much free space on the disc ? I have after copying those vmdk files only something about 50gb free space left. If this command builds one vmdk temporary, it could take 400gb like the server_1-flat file...
Wow Andrew,

no problems, what I'm here for.,..

helped me a lot so far. Thank you so much. I am really confident that everything will work this night.
Does this merging need much free space on the disc ? I have after copying those vmdk files only something about 50gb free space left. If this command builds one vmdk temporary, it could take 400gb like the server_1-flat file...

the VMDK file it builds will take the same size as the parent, so you will need at least 400GB.

50GB is not enough.

You could just commit the disk normally....

(with VM powered off!)

1. Take a new snapshot
2.Wait 60 seconds to 2 minutes.
3. Select DELETE ALL

this does not require any disk space, BUT.... does ruin your chances, unless you have backups of all the files.

and that's why you ran out of disk space, if you only had 50GB left on the datastore, VMs, grow snapshots at 1GB per hour, and that is idle.

also to note the vmkfstools commands, will also verify the snapshot chain is correct before the merge, so if it's not correct, it will fail, and error.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of loosain

ASKER

Very nice. We got this machine with all data back running. Everything is fine. A little database are still there, but this we are going to reair soon. Thanks so much !
No problems, glad you resolved the problem.