Link to home
Start Free TrialLog in
Avatar of TimFarren
TimFarren

asked on

ESXi problem

Hello guys..

I think I may have messed up pretty good here.  I have an esxi host that is running a pretty hefty exchange server.  The disks were getting full, so I decided to delete the snapshot (there's only one).  The snapshot itself is about 250 GB.  The server it's on isn't the fastest of servers (It's just a Dell PE T420).  

Anyhow.. It was going really slow (about 3 percent per 2 hours) so I decided to shut the VM down thinking the lack of disk activity would speed things up.  Not only did it NOT speed things up, but now my server is stuck at shutdown and ESXi is complaning it can't shutdown due to a disk cleanup operation.  It's 2 AM now, and it's fine that the server is down at night, but during the day I will start to get many, many complaints from many people.  I didn't make the best decisions, and now I think I am in trouble.

Can anyone offer something I can do to get this server back up?  I can't restart, stop, reset, only wait...  I'm afraid I have no other option other than to wait it out.  If I did the math right, it will take 2 full days.  I can't wait that long!!

Thanks!
Avatar of dipopo
dipopo
Flag of United Kingdom of Great Britain and Northern Ireland image

Do you have access to ESXCLI then run this:

esxcli vm process list [this will give you running processes and their world-id]

Then use

esxcli vm process kill -t soft -w world-id [performs a soft kill]

or use

esxcli vm process kill -t hard -w world-id [performs a hard kill]

as last resort

esxcli vm process kill -t force -w world-id [performs a forced kill]

Should work for you.
Avatar of Wasim Shaikh
I can understand your situation coz after readying the post my Blood Pressure has gone high, as the VM is Exchange and you Shut it down and you have also attempted to shutdown the Host !!.
I am not having any solution on what can be done in ur current situation but I can say the data after taking snapshot is getting merged into its parent vmdk, and it might take long time for 250GB
It's a shame you did not come here earlier for advice! This could have been avoided, but now you have started the process, you must leave to complete, or risk corruption of the snapshot and parent virtual disk, and data will be lost.

Please read my EE Article urgently

HOW TO: VMware Snapshots :- Be Patient

DO NOT MESS, DO NOT KILL PROCESSES, BE PATIENT!

This is bad advice, if you want to maintain a working virtual disk and Exchange VM.

Do not follow dipopo's advice, if the snapshot is being merged.

If you mess and kill processes, you will corrupt the snapshot, and certainly you will potentially lose 250GB of Mail Data.

Why is the snapshot this large, clearly your Exchange Server has been running on this snapshot for many days, weeks, or months....performance would have been poor.

Your Options:-

1. Abort at the risk of corruption, and then we could use VMware Converter, or CLONE to get rid of snapshot issues.
2. Be Patient, and Wait for the Snapshot Merge to complete, this could take minutes, hours, days, or weeks, longest I've seen a snapshot merge was 6 days.

Can you upload a screenshot of the datastore with the snapshots and folders, so I can assess the issue?
Tim

Do you also have a backup you could start to restore, at least to restore service?
Avatar of TimFarren
TimFarren

ASKER

It's at 27% since 9:15 pm last night. Not good. If I calculate right it's going to be a rough Monday.

I do have a backup, but unfortunately it's a little old. It's about a week old.

Is there any chance this think could suddenly take a huge leap forward, or is the progress thus far an indication of the remainder (in terms of duration)?
And to clarify, I shut down the guest operating system, and I did it from inside the operating system itself.  I never attempted to shut down the host.  Now, the guest operating system is in a constant state of shutting down.  It cannot complete the shutdown, because the Esxi host will not let it.  Consequently, because it cannot complete the shutdown, I cannot start it back up until the merge finishes.  By the way, this is version 5.1.
ASKER CERTIFIED SOLUTION
Avatar of Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
It's a mirror.  Two 1TB drives (near-line SAS) in a hardware mirror (Dell Perc Array Card).

The drives are about 700 GB full out of 1 TB.  Attached is a screenshot of the datastore.  It only contains the 1 VM.
DataStore.bmp
okay, there are two snapshots, a small snapshot and a very large snapshot, you will just have to wait and be patient for the 250GB delta snapshot to be merged into the parent.

with two disks, could take a while, not many IOPS on the datastore with two disks.

Just sit tight, be patient and calm, it will merge and complete.
Thanks for the reassurance.  I'm a little frantic because there's about 100 mailboxes (STUPID ME) on this server and I'm bound to get into serious trouble with some clients if they are without mail for a day.. or two.. or six?!  Man.. the uncertainty is most of my anxiety.  I really can't tell people with any accuracy when it will be back up.  It's not just email either.. there's quickbooks data, etc.  Again.. STUPID ME.  Lesson learned here.

Is there ANY chance of sending an OS reset signal to the guest OS while this operation is happening?  I mean, I understand if I had left the VM running, the performance would be degraded, but at least it would be operational.  No dice?

Thanks again.
Not now you have shutdown the VM.

For your info its quicker to merge if off and does not use anymore storage space.

Also VM performance can be awful if VM is left on whilst merging.

Why has this VM been on a snapshot this long?
To be honest, I had no idea that running off a snapshot for any length of time was detremental to performance.  I was preparing to migrate it off onto a new RAID 10 array, and in preparation thought it would be good house keeping to merge the snapshots.  

Is it generally bad practice to have any snapshots on a VM?
I'm running "watch -d 'ls -luth | grep -E "delta|flat"'" over SSH and watching the files - none of the file sizes are changing, but the time stamps are.  Is this normal?
The VM is in shutdown process.
a wild thought, by any chance did you check if any of the outlook client is able to connect to exchange? or if you can connect via services mmc and check exchange services.
It doesn't even respond to a ping. And my mailbox is on this server (smartphone attached to it as well). It's definitely all the way down. No response.
Oh, yes really bad practice, bad VMware Administration or no VMware Administration occuring!

data from the delta is being merged into the parent....

my only concern is that you "meddled" shutting the VM down, during the snapshot merge, which sometimes can halt the process. once the merge is complete the delta will be deleted.

timestamps changing is a good sign....
Progress meter is moving. Just SLOW. It's now at 35%. It's been running for 19 hours. That's ridiculous.
I'm afraid that's very short one of our clients waited 6 days..

Its because of a slow two disk data store and large snapshot.
Often large snapshots quicker to clone out V2V but VM still needs to be down and can take many hours

Hence why recommended not to get into snapshot hell in the first place - sorry for that.

In future keep it checked.
Definitely. These are life's lessons. :-)
This might be off-topic, but do you have any good ideas for moving this VM on to a newly established DataStore on the same host with little to no downtime?
Update.. It hit 40% after a total of 33 hours of run time and then just jumped ahead to 100% in a matter of seconds. All is well!  Thanks everyone!!
U r lucky :-) good to hear everything went smoothly.
Keep Calm and Carry On!

Glad you are back in business, service restored.

Check Daily for Snapshots! They are EVIL!