Avatar of TimFarren
TimFarren asked on

ESXi problem

Hello guys..

I think I may have messed up pretty good here.  I have an esxi host that is running a pretty hefty exchange server.  The disks were getting full, so I decided to delete the snapshot (there's only one).  The snapshot itself is about 250 GB.  The server it's on isn't the fastest of servers (It's just a Dell PE T420).  

Anyhow.. It was going really slow (about 3 percent per 2 hours) so I decided to shut the VM down thinking the lack of disk activity would speed things up.  Not only did it NOT speed things up, but now my server is stuck at shutdown and ESXi is complaning it can't shutdown due to a disk cleanup operation.  It's 2 AM now, and it's fine that the server is down at night, but during the day I will start to get many, many complaints from many people.  I didn't make the best decisions, and now I think I am in trouble.

Can anyone offer something I can do to get this server back up?  I can't restart, stop, reset, only wait...  I'm afraid I have no other option other than to wait it out.  If I did the math right, it will take 2 full days.  I can't wait that long!!

Thanks!
VMwareDell

Avatar of undefined
Last Comment
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

8/22/2022 - Mon
dipopo

Do you have access to ESXCLI then run this:

esxcli vm process list [this will give you running processes and their world-id]

Then use

esxcli vm process kill -t soft -w world-id [performs a soft kill]

or use

esxcli vm process kill -t hard -w world-id [performs a hard kill]

as last resort

esxcli vm process kill -t force -w world-id [performs a forced kill]

Should work for you.
Wasim Shaikh

I can understand your situation coz after readying the post my Blood Pressure has gone high, as the VM is Exchange and you Shut it down and you have also attempted to shutdown the Host !!.
I am not having any solution on what can be done in ur current situation but I can say the data after taking snapshot is getting merged into its parent vmdk, and it might take long time for 250GB
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

It's a shame you did not come here earlier for advice! This could have been avoided, but now you have started the process, you must leave to complete, or risk corruption of the snapshot and parent virtual disk, and data will be lost.

Please read my EE Article urgently

HOW TO: VMware Snapshots :- Be Patient

DO NOT MESS, DO NOT KILL PROCESSES, BE PATIENT!

This is bad advice, if you want to maintain a working virtual disk and Exchange VM.

Do not follow dipopo's advice, if the snapshot is being merged.

If you mess and kill processes, you will corrupt the snapshot, and certainly you will potentially lose 250GB of Mail Data.

Why is the snapshot this large, clearly your Exchange Server has been running on this snapshot for many days, weeks, or months....performance would have been poor.

Your Options:-

1. Abort at the risk of corruption, and then we could use VMware Converter, or CLONE to get rid of snapshot issues.
2. Be Patient, and Wait for the Snapshot Merge to complete, this could take minutes, hours, days, or weeks, longest I've seen a snapshot merge was 6 days.

Can you upload a screenshot of the datastore with the snapshots and folders, so I can assess the issue?
Experts Exchange has (a) saved my job multiple times, (b) saved me hours, days, and even weeks of work, and often (c) makes me look like a superhero! This place is MAGIC!
Walt Forbes
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

Tim

Do you also have a backup you could start to restore, at least to restore service?
ASKER
TimFarren

It's at 27% since 9:15 pm last night. Not good. If I calculate right it's going to be a rough Monday.

I do have a backup, but unfortunately it's a little old. It's about a week old.

Is there any chance this think could suddenly take a huge leap forward, or is the progress thus far an indication of the remainder (in terms of duration)?
ASKER
TimFarren

And to clarify, I shut down the guest operating system, and I did it from inside the operating system itself.  I never attempted to shut down the host.  Now, the guest operating system is in a constant state of shutting down.  It cannot complete the shutdown, because the Esxi host will not let it.  Consequently, because it cannot complete the shutdown, I cannot start it back up until the merge finishes.  By the way, this is version 5.1.
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
ASKER CERTIFIED SOLUTION
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

Log in or sign up to see answer
Become an EE member today7-DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform
Sign up - Free for 7 days
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.
See how we're fighting big data
Not exactly the question you had in mind?
Sign up for an EE membership and get your own personalized solution. With an EE membership, you can ask unlimited troubleshooting, research, or opinion questions.
ask a question
ASKER
TimFarren

It's a mirror.  Two 1TB drives (near-line SAS) in a hardware mirror (Dell Perc Array Card).

The drives are about 700 GB full out of 1 TB.  Attached is a screenshot of the datastore.  It only contains the 1 VM.
DataStore.bmp
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

okay, there are two snapshots, a small snapshot and a very large snapshot, you will just have to wait and be patient for the 250GB delta snapshot to be merged into the parent.

with two disks, could take a while, not many IOPS on the datastore with two disks.

Just sit tight, be patient and calm, it will merge and complete.
ASKER
TimFarren

Thanks for the reassurance.  I'm a little frantic because there's about 100 mailboxes (STUPID ME) on this server and I'm bound to get into serious trouble with some clients if they are without mail for a day.. or two.. or six?!  Man.. the uncertainty is most of my anxiety.  I really can't tell people with any accuracy when it will be back up.  It's not just email either.. there's quickbooks data, etc.  Again.. STUPID ME.  Lesson learned here.

Is there ANY chance of sending an OS reset signal to the guest OS while this operation is happening?  I mean, I understand if I had left the VM running, the performance would be degraded, but at least it would be operational.  No dice?

Thanks again.
I started with Experts Exchange in 2004 and it's been a mainstay of my professional computing life since. It helped me launch a career as a programmer / Oracle data analyst
William Peck
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

Not now you have shutdown the VM.

For your info its quicker to merge if off and does not use anymore storage space.

Also VM performance can be awful if VM is left on whilst merging.

Why has this VM been on a snapshot this long?
ASKER
TimFarren

To be honest, I had no idea that running off a snapshot for any length of time was detremental to performance.  I was preparing to migrate it off onto a new RAID 10 array, and in preparation thought it would be good house keeping to merge the snapshots.  

Is it generally bad practice to have any snapshots on a VM?
ASKER
TimFarren

I'm running "watch -d 'ls -luth | grep -E "delta|flat"'" over SSH and watching the files - none of the file sizes are changing, but the time stamps are.  Is this normal?
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
Wasim Shaikh

The VM is in shutdown process.
a wild thought, by any chance did you check if any of the outlook client is able to connect to exchange? or if you can connect via services mmc and check exchange services.
ASKER
TimFarren

It doesn't even respond to a ping. And my mailbox is on this server (smartphone attached to it as well). It's definitely all the way down. No response.
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

Oh, yes really bad practice, bad VMware Administration or no VMware Administration occuring!

data from the delta is being merged into the parent....

my only concern is that you "meddled" shutting the VM down, during the snapshot merge, which sometimes can halt the process. once the merge is complete the delta will be deleted.

timestamps changing is a good sign....
This is the best money I have ever spent. I cannot not tell you how many times these folks have saved my bacon. I learn so much from the contributors.
rwheeler23
ASKER
TimFarren

Progress meter is moving. Just SLOW. It's now at 35%. It's been running for 19 hours. That's ridiculous.
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

I'm afraid that's very short one of our clients waited 6 days..

Its because of a slow two disk data store and large snapshot.
Often large snapshots quicker to clone out V2V but VM still needs to be down and can take many hours

Hence why recommended not to get into snapshot hell in the first place - sorry for that.

In future keep it checked.
ASKER
TimFarren

Definitely. These are life's lessons. :-)
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
ASKER
TimFarren

This might be off-topic, but do you have any good ideas for moving this VM on to a newly established DataStore on the same host with little to no downtime?
ASKER
TimFarren

Update.. It hit 40% after a total of 33 hours of run time and then just jumped ahead to 100% in a matter of seconds. All is well!  Thanks everyone!!
Wasim Shaikh

U r lucky :-) good to hear everything went smoothly.
Experts Exchange is like having an extremely knowledgeable team sitting and waiting for your call. Couldn't do my job half as well as I do without it!
James Murphy
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

Keep Calm and Carry On!

Glad you are back in business, service restored.

Check Daily for Snapshots! They are EVIL!