ESXi problem

Hello guys..

I think I may have messed up pretty good here.  I have an esxi host that is running a pretty hefty exchange server.  The disks were getting full, so I decided to delete the snapshot (there's only one).  The snapshot itself is about 250 GB.  The server it's on isn't the fastest of servers (It's just a Dell PE T420).  

Anyhow.. It was going really slow (about 3 percent per 2 hours) so I decided to shut the VM down thinking the lack of disk activity would speed things up.  Not only did it NOT speed things up, but now my server is stuck at shutdown and ESXi is complaning it can't shutdown due to a disk cleanup operation.  It's 2 AM now, and it's fine that the server is down at night, but during the day I will start to get many, many complaints from many people.  I didn't make the best decisions, and now I think I am in trouble.

Can anyone offer something I can do to get this server back up?  I can't restart, stop, reset, only wait...  I'm afraid I have no other option other than to wait it out.  If I did the math right, it will take 2 full days.  I can't wait that long!!

Thanks!
LVL 2
TimFarrenAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

dipopoCommented:
Do you have access to ESXCLI then run this:

esxcli vm process list [this will give you running processes and their world-id]

Then use

esxcli vm process kill -t soft -w world-id [performs a soft kill]

or use

esxcli vm process kill -t hard -w world-id [performs a hard kill]

as last resort

esxcli vm process kill -t force -w world-id [performs a forced kill]

Should work for you.
0
Vaseem MohammedCommented:
I can understand your situation coz after readying the post my Blood Pressure has gone high, as the VM is Exchange and you Shut it down and you have also attempted to shutdown the Host !!.
I am not having any solution on what can be done in ur current situation but I can say the data after taking snapshot is getting merged into its parent vmdk, and it might take long time for 250GB
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
It's a shame you did not come here earlier for advice! This could have been avoided, but now you have started the process, you must leave to complete, or risk corruption of the snapshot and parent virtual disk, and data will be lost.

Please read my EE Article urgently

HOW TO: VMware Snapshots :- Be Patient

DO NOT MESS, DO NOT KILL PROCESSES, BE PATIENT!

This is bad advice, if you want to maintain a working virtual disk and Exchange VM.

Do not follow dipopo's advice, if the snapshot is being merged.

If you mess and kill processes, you will corrupt the snapshot, and certainly you will potentially lose 250GB of Mail Data.

Why is the snapshot this large, clearly your Exchange Server has been running on this snapshot for many days, weeks, or months....performance would have been poor.

Your Options:-

1. Abort at the risk of corruption, and then we could use VMware Converter, or CLONE to get rid of snapshot issues.
2. Be Patient, and Wait for the Snapshot Merge to complete, this could take minutes, hours, days, or weeks, longest I've seen a snapshot merge was 6 days.

Can you upload a screenshot of the datastore with the snapshots and folders, so I can assess the issue?
0
The Ultimate Tool Kit for Technolgy Solution Provi

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy for valuable how-to assets including sample agreements, checklists, flowcharts, and more!

Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Tim

Do you also have a backup you could start to restore, at least to restore service?
0
TimFarrenAuthor Commented:
It's at 27% since 9:15 pm last night. Not good. If I calculate right it's going to be a rough Monday.

I do have a backup, but unfortunately it's a little old. It's about a week old.

Is there any chance this think could suddenly take a huge leap forward, or is the progress thus far an indication of the remainder (in terms of duration)?
0
TimFarrenAuthor Commented:
And to clarify, I shut down the guest operating system, and I did it from inside the operating system itself.  I never attempted to shut down the host.  Now, the guest operating system is in a constant state of shutting down.  It cannot complete the shutdown, because the Esxi host will not let it.  Consequently, because it cannot complete the shutdown, I cannot start it back up until the merge finishes.  By the way, this is version 5.1.
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Despite what management state Monday morning, do not be tempted to restart the Host, or fiddle or mess with the VM. (this can cause serious snapshot corruption, and data can be lost forever).

Just let it complete, and I'm afraid Be Patient.

Because there is a Task in Progress (Snapshot) the VM is locked, until this finishes.

Progress Bars I'm afraid are unreliable, sometimes they can just stick/hang at 95%/99%.

Just wait and Be Patient....

Can you screenshot the datastore?

what datastore are you using RAID type, disk type ? disk speed?
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
TimFarrenAuthor Commented:
It's a mirror.  Two 1TB drives (near-line SAS) in a hardware mirror (Dell Perc Array Card).

The drives are about 700 GB full out of 1 TB.  Attached is a screenshot of the datastore.  It only contains the 1 VM.
DataStore.bmp
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
okay, there are two snapshots, a small snapshot and a very large snapshot, you will just have to wait and be patient for the 250GB delta snapshot to be merged into the parent.

with two disks, could take a while, not many IOPS on the datastore with two disks.

Just sit tight, be patient and calm, it will merge and complete.
0
TimFarrenAuthor Commented:
Thanks for the reassurance.  I'm a little frantic because there's about 100 mailboxes (STUPID ME) on this server and I'm bound to get into serious trouble with some clients if they are without mail for a day.. or two.. or six?!  Man.. the uncertainty is most of my anxiety.  I really can't tell people with any accuracy when it will be back up.  It's not just email either.. there's quickbooks data, etc.  Again.. STUPID ME.  Lesson learned here.

Is there ANY chance of sending an OS reset signal to the guest OS while this operation is happening?  I mean, I understand if I had left the VM running, the performance would be degraded, but at least it would be operational.  No dice?

Thanks again.
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Not now you have shutdown the VM.

For your info its quicker to merge if off and does not use anymore storage space.

Also VM performance can be awful if VM is left on whilst merging.

Why has this VM been on a snapshot this long?
0
TimFarrenAuthor Commented:
To be honest, I had no idea that running off a snapshot for any length of time was detremental to performance.  I was preparing to migrate it off onto a new RAID 10 array, and in preparation thought it would be good house keeping to merge the snapshots.  

Is it generally bad practice to have any snapshots on a VM?
0
TimFarrenAuthor Commented:
I'm running "watch -d 'ls -luth | grep -E "delta|flat"'" over SSH and watching the files - none of the file sizes are changing, but the time stamps are.  Is this normal?
0
Vaseem MohammedCommented:
The VM is in shutdown process.
a wild thought, by any chance did you check if any of the outlook client is able to connect to exchange? or if you can connect via services mmc and check exchange services.
0
TimFarrenAuthor Commented:
It doesn't even respond to a ping. And my mailbox is on this server (smartphone attached to it as well). It's definitely all the way down. No response.
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Oh, yes really bad practice, bad VMware Administration or no VMware Administration occuring!

data from the delta is being merged into the parent....

my only concern is that you "meddled" shutting the VM down, during the snapshot merge, which sometimes can halt the process. once the merge is complete the delta will be deleted.

timestamps changing is a good sign....
0
TimFarrenAuthor Commented:
Progress meter is moving. Just SLOW. It's now at 35%. It's been running for 19 hours. That's ridiculous.
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
I'm afraid that's very short one of our clients waited 6 days..

Its because of a slow two disk data store and large snapshot.
Often large snapshots quicker to clone out V2V but VM still needs to be down and can take many hours

Hence why recommended not to get into snapshot hell in the first place - sorry for that.

In future keep it checked.
0
TimFarrenAuthor Commented:
Definitely. These are life's lessons. :-)
0
TimFarrenAuthor Commented:
This might be off-topic, but do you have any good ideas for moving this VM on to a newly established DataStore on the same host with little to no downtime?
0
TimFarrenAuthor Commented:
Update.. It hit 40% after a total of 33 hours of run time and then just jumped ahead to 100% in a matter of seconds. All is well!  Thanks everyone!!
0
Vaseem MohammedCommented:
U r lucky :-) good to hear everything went smoothly.
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Keep Calm and Carry On!

Glad you are back in business, service restored.

Check Daily for Snapshots! They are EVIL!
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
VMware

From novice to tech pro — start learning today.