Link to home
Start Free TrialLog in
Avatar of cembi
cembiFlag for United States of America

asked on

Removing virtual machine snapshots in Esx 4.1

I have a virtual Exchange 2003 which is residing on an Equallogic SAN. It had been working fine for a long time but last night was powered down unexpectedly. When I tried to reboot it via vSphere client it gave me a space error regarding the datastore: "Could not power VM: No space left on device."
The SAN volume was showing only about 1% free space whereas the datastore on vSphere was showing close to 40% free space. Why this difference? Aren't they supposed to show the same numbers?
I tried deleting a couple of snapshots on SAN but that didn't help. Then I called Dell tech support. They decided to increase the size of the volume on the RAID. After doing a couple of refreshes on vSphere the datastore space started showing the correct numbers close to those reported by SAN console. WHY?
At that point I tried another power on. It did power but it stayed at 95% for about 10-15 mins. WHY?
Dell techs suggested that I clean snapshots of the machine from vSphere in order not to run into space issues anymore. How can I do that without risking any damage on the virtual machine itself?

Thank you.
Avatar of flaphead_com
flaphead_com
Flag of United Kingdom of Great Britain and Northern Ireland image

did you thin provision the SAN luns?
Avatar of Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)
BE PATIENT! Have a cup of coffee, and read my EE Article.

HOW TO: VMware Snapshots :- Be Patient

Please upload a screenshot of the datastore, do not do anything else at this time.

I'll discuss your options.
Avatar of cembi

ASKER

User generated image

Here is a screenshot. The client is thick provisioned. The SAN volume has a max size of 1.2TB. There are 175GB free as of right now. How soon can I run into same trouble if no immediate measures are taken? I'd hate to go through last night's stress again soon :-(

Thanks.
okay the VM has 5 snapshots, approx 60GB, can you check if you start Snapshot Manager are they listed?
Avatar of cembi

ASKER

User generated image
User generated image
It seems like VM is running on a snapshot, right hancocka?
Avatar of cembi

ASKER

By the way your article is fantastic and explains it perfectly. Based on it I want to add that I have been experimenting with Veeam Free version. I have it installed on a physical Windows 2003 in my LAN and been using it to backup this virtual Exchange on to a USB external disc. I ran it a couple of times and it lasted about 9 hours and ended with a warning about snapshots. I hope this provides more clues.
The biggest issue with all backup products, is they leave a VM in a snapshotted mode, and unless you regularly check, as part of your VMware Admin Daily checks, or setup alarms, scripts to warn you of snapshots, they can quickly fill up a disk, and the VM fails and stops!

Yes, it is running on a snapshot disk.

Thanks for your kind comments, about the article.

Okay.....a number of ways we can attack this....

but, the bottom line BE PATIENT, deleting the snapshot, and the merger of 60GB into the parent disk, can take hours, minutes or seconds. In your case hours, it depends on the storage system, and how fast, and it will look like it's doing nothing, sit at 95% (like it's hung), and during this time, do not mess, cancel, fiddle, just wal away if will finish!

can we turn off the VM?

it's quicker, does not require any additional disk space, BUT disadvantage, is the VM is out of action until the merge is complete?
Avatar of cembi

ASKER

It cannot be turned off and it cannot stay down for long unless absolutely necessary. It would only be done in a weekend but I would hate to do it during this weekend.
Is there another alternative? If I don't run Veeam in the meanwhile does it make safer until next weekend when I can follow your advice?  
Thank you.
SOLUTION
Avatar of Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of cembi

ASKER

Thank you hancocka. I really appreciate all the details. I think I will have to go with the powered on method. A few last questions and observations:
- Performance is already not optimal, delays in Outlook happen throughout the day.
- After VM was succesfully started Exchange Store and Attendant were not started. I had to start them manually. Could this issue be related to snapshots too?
- At the moment SAN volume is reporting around 200GB free space, datastore is reporting 172.95. Considering this is the only VM on the volume do you think this can wait until next Friday evening when I am thinking I can get it started? Is the datastore space number a good, reliable indicator to tell how fast snapshots are growing?
- Before I start the consolidation process per your directions, how can I create a good, reliable backup of this VM? Right now I am relying on SAN snapshots, Symantec Backup Exec Mailbox/Store backups and NTbackups. Do you think these are good enough for a fallback in case sth goes wrong during consolidation? Running another Veeam backup obviously increases the risk of running out of space, right?

Thanks again.
Yes, CPU high performance issues due to high i/o disk load.

200GB is not alot of space, but there is only day left until weekend, it really does depend on how many users, and traffic flow, eg number of emails.

200GB safe margin I believe until Friday out of hours!

if you are talking next friday eg 18th Jan + 7 dangerous!

problem for your also is using Backup Exec and NT Backup although good backups will flush the logs, and hence snapshot will grow by this amount!?

running another Veeam Job will create another snapshot, and it could then try and merge after backup finished, but Veeam is the Best to Restore from.

can you expand the LuN again, have space?
Avatar of cembi

ASKER

User generated image
User generated image
No, for some reason the LUN is capped at 1.2TB.
Regarding Veeam as I mentioned previously it ends with a warning about snapshots. It seems to me the warning is about the incapability of deleting the snapshot. That is why I hesitate to rerun it again. Am I correct in these assumptions?

NT Backup runs inside VM and saves locally on the E drive. Backup Exec runs as an agent inside VM too just like it was running when this VM was physical. Do these contribute to snapshot space too?
as for your backups in the VM, yes they do contribute to snapshot growth, because they flush the exchange logs any new writes go into the delta snapshot.
200gb is tight , for a week, but it depends on how busy your exchange server is.

yes, Veeam would complain about snapshot.

I would plan for emergency downtime from tomorrow evening or have no exchange server, and no mail, and having to complete a DR exercise with 2003 and users with no mail.

sorry to be blunt, but Ive seen things go bad with ALL email lost!
Avatar of cembi

ASKER

I hear you. Thanks again. I think that's what I'll do.
let know when you are starting, and, I can be here to "hold your hand"
Avatar of cembi

ASKER

From bad to worse. I lost another 55GB of space due to an NTBackup running on schedule. I stopped it and also disabled Symantec Backup for this evening. Space in vSphere now shows as 112GB. Hopefully it will hold until tomorrow evening.
Is there a minimum space requirement for running consolidation of snapshots while powered ON? What would be the best backup approach before running consolidation since Veeam cannot be run? Downloading VM to another disc while it is powered down?
Thanks a ton.
Yes, I did state that using NTBackup and Symantec Backup Exec, would increase the snapshot size, when the logs get flushed!

see http:#a38789569

112GB is very low, if the VM is OFF, you can run Veeam or download ALL the files, of copy to another datastore.
Avatar of cembi

ASKER

- Would Veeam run faster with the VM OFF? It took around 9 hrs to backup with poer ON.
- If I create another volume of 2TB on SAN I can move this VM on it and start running again?
- At this point with 111.03GB free space reported on ESX is it still an option running consolidation with power ON? My BIG concern is shutting VM down and then finding out that consolidation is taking more than 2 days and be forced to wait without knowing how much longer system will be down. CEO is on email 24 hrs a day.
- If consolidation is started online would space start releasing right away? Is there a risk that during the course of it space is consumed faster than it is released?

Thank you.
Avatar of cembi

ASKER

Another question: Even if I delete files in Exchange it won't help with space, right? So if I delete 10GB of data in Exchange, it won't release that space and instead it will decrease it by 10GB, is it how it works?
missed that other post.

How is the CEO going to re-act, when he has no email to check, because it's ALL GONE!

The time it takes Veeam Backup to run, is because of the data being transferred from the server to backup location, it will not run much faster with VM off.

Yes, you can create a new datastore, and MOVE the existing VM, this will take time, and can be dangerous with snapshots attached! (there's a warning there!)

If you consolidate online, performance could get worse, additional space will be used, and the danger is you run out of disk space before the process completes, resulting in corrupted snapshots, corrupted VM, and no email server, as the VM will stop, when disk space is used up.

No, if you delete 10GB, the snapshot delta will grow by at least 10GB, because all those changes will be recorded!

(that's whats happening when NT Backup runs, it flushes the logs (deletes), and all those changes are recorded in the delta snapshot).

So the more writes your create, the more the snapshot will grow.

if you stop all access to the mail server, snapshot would stop growing as much, but even keeping a server up on a snapshot grows the snapshot by 100-200MB an hour, just doing nothing!
Avatar of cembi

ASKER

What would be your guess with regards to running it while powered OFF? Could it last more than 2 days? Also, while it is consolidating could I start the VM at some point since some space would have been released already or is this not possible and once consolidation starts nothing should change?

I can't thank you enough for all this invaluable help.
Once you turn off the VM, and start the consolidation (merge), the VM will be running a task, and no other tasks can be performed, that includes Power On, you will not be able to power it on.

very difficult for me to predict, how long it would take it depends on speed of storage.

it's quite a small snapshot, compared to the most I've seen, if I was to guess, 3-4 hours (maybe!) do not quote me, but if I was scheduling outage, I would go for 24/48 hours.
Avatar of cembi

ASKER

Awesome. I was worried it may last for days. I will keep you posted. Thanks.
Avatar of cembi

ASKER

Hi hancocka.

I just realized that I can increase the size of the RAID volume by around 200GB. This was an earlier question posed by you. I doesn't hurt to increase it correct? Is there a short guide on how then increase the size on the datastore itself? What I know is that the VM needs to be powered down, then increase the RAID volume on SAN - go to vSphere Datastore Properties - Increase. What do you think?

I had a chat earlier with a VMWare tech and he said the 108GB should be enough to run snapshot deletion online.

Thanks again.
no it does not hurt to increase lun then datastore

select the datastore, properties, increase
Avatar of cembi

ASKER

I started the process with the machine ON. It ended after about 10 mins with an error: Remove all snapshots:

PROLIANT-NY
Unable to access file
<unspecified filename> since
it is locked
root
1/18/2013 2:17:19 PM
1/18/2013 2:17:19 PM
1/18/2013 2:26:35 PM

What gives?!
does the datastore look any different, the files?

that's a very unusual error message.

what did you do to get that error message?

no other backup program, Veeam is not running?
Avatar of cembi

ASKER

I just followed your steps except that the vm was ON. Yes, there is a new snapshot. Also when I took the snapshot the Exchange was inaccessible by Outlook. Once snapshot was created it came online. Then same thing happened when I started the deletion process. Wxchnage went offline and then came back online when the process failed. What can I try?
okay, so VM was ON.

1. Select Snapshot, Take Snapshot
2. Check a new snapshot is created on the disk.
3. Check a new snapshot is created in the Snapshot Manager.
4. Click DELETE ALL this will Delete and Merge (Consolidate) ALL the Snapshots!


at which point did the error occur?

and is anything listed in Snapshot Manager?
Avatar of cembi

ASKER

All backups have been disabled. Could it be tha Veeam is still accessing its own snapshots?
Avatar of cembi

ASKER

After step 4. Once I clicked Delete All the task went to 95% and stayed there for about 10 mins during which time Wxchnage went offline. Then it stopped with error and Wxchnage came back up.
shutdown Veeam, if it's still running just in case.

okay potentially, that's quite worrying because it could be the snapshot chain has been corrupted already, which can happen, if you run out of disk space, which you already have once, when the VM first stopped.

The only thing, is to shutdown the VM, e.g. OFF.

try again. Steps 1 to 4.

Exchange going offline is caused by high cpu, and the VM is frozen to apply to merge the snapshot.
it's actually getting late here in the UK, (GMT), so I'll hang around for a few more posts and responses.
Avatar of cembi

ASKER

Ok I will do that. Turn vm off and make sure Veeam is fully out. Exchange is not accessible anyway so it doesn't matter if vm is on or off.
i'll wait before going off to bed, to see if it does not stop after 10 mins, like before...
Avatar of cembi

ASKER

Man, things are getting strange. Just before doing what you recommended a VMware engineer calls me because he noticed I had called earlier. He tells me that no way I should shut down the VM as it seems to be an iffy situation and Exchange may not come back up. He tells me to use vConverter, install it on the VM and run it and convert it just like a P2V onto a new SAN volume (which I have handy by chance). He said this is by far the safest way of doing this.

I am perplexed. Your thoughts?!?!
why did he thing it was iffy, unless they have WebExed in and taken a real-time look at the situation, which we do not have the option of doing remote.

VMware always recommend the use of VMware Converter to get out of Snapshot situations! (because they do not want to spend time, supporting Customers!), and see it as a get out of jail issue. It's always the last resort. (which was my last option).

your decision.....

BUT (V2V can cause issues) - VMware probably didn't tell you also that P2V-ing an Exchange VM is not Supported by Microsoft, and can cause corruption. The official way, is to create a new Exchange Server, and Move the Mailboxes, and then remove the old Exchange Server.

See my EE Articles

HOW TO: Improve the transfer rate of a Physical to Virtual (P2V), Virtual to Virtual Conversion (V2V) using VMware vCenter Converter Standalone 5.0

HOW TO:  P2V, V2V for FREE - VMware vCenter Converter Standalone 5.0

So, options are yours......

Did they give you any instructions on how to V2V a Live Exchange 2003 Server  ?
Avatar of cembi

ASKER

The tech swore that V2V is by far the easiest and safest way which strangely wasn't mentioned by the very first engineer that did 2 webex sessions with me earlier today. This first guy besides suggesting the snapshot delete with VM powered on also mentionesh using putty and SSH to clone the VM onto a new LUN and in the process start with a brand new VM without the snapshots.

Have a great night and thanks a lot. I am not sure at the moment. I'll probably do something tomorrow. I am tired enough this very cold evening.
Yes, CLONE disk to second LUN was my second option.

But all depends on state of chain.

go with "VMware's V2V option." if you are most comfortable with their support of the Exchange 2003 VM.
Avatar of cembi

ASKER

Hey Andy. No, vmware tech didn't take the time to explain the v2v process. He basically was in a rush to get home as his shift was over. My feeling is that there is sth really wrong with these snapshots. The weird thing is that the last couple of times that this VM was restarted the exchange system attendant and store were not started automatically and I had to start them  manually. I am going through your articles and will read some more.
So in your opinion it would be best to try deletion with VM off and if that doesn't work use cloning and then last option would be V2V?
Have a great weekend.
Avatar of cembi

ASKER

I found an article online where the user had same exact issue. It seems like Veeam keeps the connection open to the VM even when it is not running a backup hence the file locked error. I will disable all Veeam services and give it another go.
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of cembi

ASKER

So I shouldn't try Delete All when Veeam is all shutdown? Any harm in doing this?
Avatar of cembi

ASKER

Andy, is there any way of contacting you via email?
all comminications at EE must be through the forum.

Terms of Use

Offline Answers
Experts Exchange is built on sharing information and solutions. The use of email or other communications systems that is outside of Experts Exchange's question and answer or articles system is prohibited, and no points will be awarded for any solution arrived at outside of those systems.

The Moderators and Topic Advisors will remove your email address if you post it; in addition to keeping you from the violation listed above, it will help keep you from getting too much unwanted email.


Source
https://www.experts-exchange.com/help/viewHelpPage.jsp?helpPageID=181

I think a host restart, and Veeam shutdown, not running, is Veeam in a VM or physical server?
Avatar of cembi

ASKER

OK, here is an update:
After a 3hr phone conversation with a VMware engineer from Ireland -:) we finally got the process running. What happened is that when we tried deleting snapshots with VM off it did so in an instant and basically didn't delete anything. There was still a lock on one of the files. We had to reboot the ESX host and all locks were gone. The he removed VM from inventory, edited .vmx file to point to last night's snapshot, added it to invetory and did a power on test. VM was fine and were able to power VM on without an issue. We powered down again. He removed the 8 and 9 snapshots (latest) and started the deletion again. Within 1 hr reached went gradually from 5 to 99% and it has been at that for about 2 hrs. Fingers crossed it will end successfully sometime this weekend.
He stated that the chain of snapshots didn't seem to be corrupt.

Veeam is on a physical machine but all Veeam services on that machine have been stopped.

Thank you.
Hence why I asked you to reboot the host to clear the locks, I'm glad VMware agree with me!

Once this has been completed, you need to add daily snapshot checks to your VMware Admin routines!

So this engineer, clearly didn't want todo a V2V using Converter, as per previous VMware Engineer!
Avatar of cembi

ASKER

Never doubted you man. You have been a great help and kept me focused. Actually if not for your advice my VM would have run out of space by now.
Is there a way to monitor the progress of the process while at 99%? The engineer put up a putty screen which shows all vmdk files in a list and told me to watch as they disappear while consolidating. No change after 4 hours though. I hope this is an accurate indicator.
there is not really much to monitor. you can login to server via SSH, and watch, but it does not show much.

Be Patient, go away, have a Mac Donalds, or sandwidch, cup of coffee, nothing worse than staring at a screen.
I'm actually also working on another EE related snapshot issue, they are very common!
Avatar of cembi

ASKER

The consolidation ended successfully in about 8 hours. VM powered on and email performance is much better.

- What should I do to avoid the snapshot issue in the future?
- What is the best way of backing up this Exchange VM? Right now I use symantec backup exec to backup information store to tape and snapshots on Equallogic and also Veeam which caused the issue. I am a bit disappointed in Veeam.
- Is there a good and simple guide to maintain an ESX infrastructure?

Perhaps these are issues for a new question so let me know if I need to do that.

Thank you Andy.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of cembi

ASKER

Thanks. Are the snapshots left behind because of failed backups or they will be there regardless and this consolidation process will be needed to run regularly ?
the backup would have been successful, but maybe flagged as failed in the backup logs.

VMs should not be left running on a snapshot after a backup, but this often occurs.

setup alerts, check daily
Avatar of cembi

ASKER

Many thanks.