[Webinar] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

VI Client to Host - Remove Snapshot 95% for 2 days!  Help!

Posted on 2014-08-19
17
Medium Priority
?
339 Views
Last Modified: 2016-02-25
Veeam Backup & Replication backed up my Exchange 2010 server on Saturday night/Sunday morning.  The backup reached 100% and began removing the snapshot.  It's apparently been stuck at 95% since Sunday morning.  The server is up and running and the delta files are slowly growing, not shrinking as I'd expect when committing the snapshot.  I'm connected directly to the esx 3.5 host (no vCenter) and monitoring the file sizes via the datastore browser.  I don't see any errors in the HP Insight Manager logs to indicate a hardware issue.  Kernel latency looks nominal so I don't think it's an I/O issue.  My fear is that I only have 300GB free on the 2TB datastore and my log volume on the Exchange server is not overly large.  (I can turn on circular logging before that's an issue though)  The problem is, this backup has never taken more than 15-20 minutes to remove a snapshot from this server.  I've read that restarting the host can clear this up, but I'm scared to death I'm going to kill Exchange if I shut it down and reboot the host.  Anyone have any ideas on what to try first?  Everything is grayed out in Snapshot Manager so I can't delete all or create a new one.

Help!

Rusty
0
Comment
Question by:rpmahony
  • 9
  • 8
17 Comments
 
LVL 124
ID: 40270861
Please Read my EE Article URGENTLY!

HOW TO: VMware Snapshots :- Be Patient

Do not be tempted to mess, cancel, reboot, stop, it could cause virtual disk corruption.....

Please upload a screenshot of the datastore, for me to have a look at the files,,,,

What is the datastore, type of disk, speed, RAID type....

number of snapshots.....

(I've seen some snapshots take 3 weeks to DELETE and MERGE!)
0
 

Author Comment

by:rpmahony
ID: 40271039
Here goes:

The datastore is internal RAID 5 storage in an HP Proliant DL385G7 comprised of 10k SAS drives.  Total size of the store is 1.9TB with 299GB free.  Attached is a screenshot of the VM files for the problem guest.
Screen-Shot-2014-08-19-at-11.29.22-AM.pn
0
 
LVL 124
ID: 40271117
Okay, RAID 5 is not the fastest of datastores, how many disks in RAID 5?

Do you have a Battery Backup Write Cache (BBWC) fitted to the Smart Array Controller, these are optional, and only fitted as standard on the performance models, this can affect performance, and you'll be surprised at the performance with one fitted, compared to a server without!

As the VM is on, and active, and an Exchange server is a busy server, with all the mail flow, all this will be written to a delta (snapshot) disk, until the merge is complete....

Okay, here comes the bad news, in fact you have 2 snapshots, so that's 2 backups you have missed, not closing the snapshot correctly!

This happens, there is NOTHING you can do about it, apart from add to your Daily VMware Admin AM checks, and if you spot a Snapshot, do something about it.... or setup Snapshot Alarms to WARN, when you have a snapshot....leaving on a snapshot, causes poor performance, and gets you into Snapshot Hell!

more snapshots, the more time required to get out of it....

and you've got five disks per VM, and there is are 2 snapshot for each....

disk1 - cp-ex2010.vmdk 1 300MB snapshot exists, a snapshot has been merged

disk2 - cp-ex2010_2.vmdk 1 8GB  snapshot exists, , a snapshot has been merged

disk3 - cp-ex2010_3.vmdk 1 50MB snapshot exists, a snapshot has been merged

disk4 - cp-ex2010_4.vmdk 2 snapshots exist, 1.5GB

disk5 - cp_ex2010_5.vmdk 2 snapshot exist 8GB

They are very small snapshots and should have merged by now.....

How is the performance of the server?

This is a difficult call, snapshot merge should have finished.

I would give it another 12 hours, and then act.

Which would involve, restart networking managements agents on the host, and if that does not free up, a server restart.....

HOWEVER, It's possible and be prepared for Snapshot corruption, of the last snapshot....and data, if the snapshot chain is corrupted.

Do you have ANY other additional storage space, we could clone out these disks to a new datastore?
0
Windows Server 2016: All you need to know

Learn about Hyper-V features that increase functionality and usability of Microsoft Windows Server 2016. Also, throughout this eBook, you’ll find some basic PowerShell examples that will help you leverage the scripts in your environments!

 

Author Comment

by:rpmahony
ID: 40271239
Going to answer the last question first:  I have a Seagate Black Armor I could use as temporary iSCSI storage but that idea is scary.  I have a couple blank 3TBs I could throw in there and stand it up if that's what it takes.

The current internal storage array is 8x 300GB.  There IS a WBC module on the array controller so I/O usually isn't a problem.  (The server originally shipped without - an oversight - that was quickly remedied).

Server performance seems fine, other than store.exe being the usual memory hog.  Nothing out of the ordinary from what I can see.  Mail is flowing in and out quickly, disk utilization is nominal, CPU is hanging around 30%.

As for the two-snapshot thing, Friday night's backup completed successfully and Saturday's is the one that was "stuck".  I'm not sure where the second snapshot came from.  This is definitely a first for me as this machine has been running like a top for three years.
0
 
LVL 124
ID: 40271261
Leave it 12 hours, and wait, and lets see what happens....
0
 

Author Comment

by:rpmahony
ID: 40271336
One last question before I leave it alone: should I turn on circular logging in Exchange to save log space?  My log volume gets tight when it misses a backup.  Better to have the server up and running than to worry about what circular logging will do to the snapshot size, no?  I'm sure the backup is going to fail when it attempts to create a snapshot, so I've removed this VM from the backup job for tonight, meaning the Exchange log files will not be flushed and committed.
0
 

Author Comment

by:rpmahony
ID: 40272317
Update: file sizes are still climbing this morning.  I'll be adding the Seagate NAS as an iSCSI target when I get to the office.  I'm down to 275GB free space on the current datastore.  I can move a couple smaller VMs to the NAS to make room but I'm just not sure when to say "it's just stuck and it's not going to resolve itself".  Unfortunately, 3.5 is EOL so VMware won't help me with this one, even though I have the software and maintenance good for v5.  (Was hoping they'd throw me a bone).
Screen-Shot-2014-08-20-at-7.07.07-AM.png
0
 
LVL 124
ID: 40272339
So the snapshot has not merged, and it's almost been 12 hours ?
0
 

Author Comment

by:rpmahony
ID: 40272544
Over 12 hours actually. I let it be overnight so I could get some sleep.
0
 
LVL 124
ID: 40272663
Okay, I think this has crashed, and it's time to cancel/reboot/stop this operation.

Please note, this could cause corruption to the virtual disk, and data could be lost. (exchange emails).

I assume backups are no longer functioning because of the current operation.

So here are some options:-

1. You could shutdown the Exchange Stores, and copy the databases off, to another location, and then complete an Exchange 2010 restore, if the virtual disk are corrupted beyond repair....

2. Shutdown Exchange Services, and create a V2V using VMware Converter, in effect creating a new machine/clone with all the current data, no snapshots. Once completed shutdown, and erase the old machine. Start this new machine, and you are free of snapshots and back to where you were.

3. Go for the reboot/switch off, and see what state the snapshots are in......if corrupted loss of data possible...
0
 

Author Comment

by:rpmahony
ID: 40273167
Option 2 seems the most logical, safest bet.  I have vReplicator as well.  If I were to stand up a new ESX, would I be able to use vReplicator or will the snapshot issue kill that idea or make matters worse?  Re: VMware Converter, does it require any temp space on the source host?  If so, I'm in a bind...
0
 
LVL 124

Accepted Solution

by:
Andrew Hancock (VMware vExpert / EE MVE^2) earned 2000 total points
ID: 40273286
You will need to use VMware Converter in the OS!

I would go for option 2, and downtime is required, to STOP ALL EXCHANGE services.
0
 

Author Comment

by:rpmahony
ID: 40274036
I'm thinking I may be fubar now.  When trying to add the iSCSI target, a message appeared saying that a configuration change was made that requires restarting the host.

I'm not usually one to turn to the bottle… but I'm getting there.  >.<
0
 
LVL 124
ID: 40274068
Yes. if memory servers me correct, the iSCSI software initiator, in 3.5, needed a host reboot, to initialize it!
0
 

Author Comment

by:rpmahony
ID: 40274069
If I can ok adding another server (and get one tomorrow) I suppose I can use vConverter to move V2V the VM to the new server, right?

Edit: Also, if I install vSphere 5 on the new box, will vConverter be able to V2V from 3.5 to 5?

Time to kiss up to the boss, methinks.
0
 
LVL 124
ID: 40274093
Yes, you can use VMware Converter, to V2V a VM from 3.5 to 5.0.
0
 

Author Closing Comment

by:rpmahony
ID: 40306839
Unmounted databases, stopped Exchange services and ran the converter.  It took about 36 hours to convert the machine to a datastore located on a Promise NS4600 attached to an older ESX box via NFS (never could get iSCSI to connect correctly).  Removed the old VM after Exchange came up in its temporary location with no errors.  Used the same process to move it back to the original ESX which took 19 hours.  Everything is happy once again.  Thanks again for all the help and I apologize for not updating this earlier.
0

Featured Post

 [eBook] Windows Nano Server

Download this FREE eBook and learn all you need to get started with Windows Nano Server, including deployment options, remote management
and troubleshooting tips and tricks

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this article we will learn how to backup a VMware farm using Nakivo Backup & Replication. In this tutorial we will install the software on a Windows 2012 R2 Server.
Your data is at risk. Probably more today that at any other time in history. There are simply more people with more access to the Web with bad intentions.
This tutorial will walk an individual through locating and launching the BEUtility application to properly change the service account username and\or password in situation where it may be necessary or where the password has been inadvertently change…
This tutorial will walk an individual through the process of installing of Data Protection Manager on a server running Windows Server 2012 R2, including the prerequisites. Microsoft .Net 3.5 is required. To install this feature, go to Server Manager…

868 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question