Solved

VM virtual servers excessive reboot times

Posted on 2010-09-17
17
830 Views
Last Modified: 2012-08-13
We're relatively new to VMWare.  We have two physical boxes running that latest version of free ESXi.  The servers are new HP Proliant ML350 G6's configured with RAID-5.  Plenty of horsepower I would think.  32GB RAM.  Each physical box is hosting 3 virtual servers, none of which have excessive load; i.e. they're basically file servers.  The virtual servers are all running Win2K3 R2 and each is allocated 4GB of RAM.

We've noticed that if we initiate a Shutdown/Restart on a virtual server, it takes upwards of 30 minutes to finish the reboot cycle.  Compare this with a similar Windows server, non-VM, that would take 4-5 minutes.  If we pull up the console via the vSphere Client, we'll stare at a gray screen for 20+ minutes.  Often times we end up powering off the instance in question just because it takes too long to wait for it to finish.  Both physical boxes and all six virtual servers are exhibiting these symptoms.  I'm thinking we must be missing some VM-101 setting that we're unaware of.

Any ideas??
0
Comment
Question by:SBSIAdmin
  • 7
  • 5
  • 3
  • +2
17 Comments
 
LVL 28

Accepted Solution

by:
bgoering earned 333 total points
Comment Utility
Have you installed the vmware tools in all of your guests?
0
 

Author Comment

by:SBSIAdmin
Comment Utility
Yes.  We originally thought that was it, but no luck.
0
 
LVL 28

Assisted Solution

by:bgoering
bgoering earned 333 total points
Comment Utility
Tell me a bit more about your environment. How were these vms created? Were they a P2V from a physical box? Have you adjusted the RAM size? For an experiment drop one to 1 or 2GB of RAM and see if it boots faster. If it does adjust it back to the 4GB and it should be ok going forward. It is a bit inexplicable why that works, but it has been an occassional bug in VMware.

Another thing to look at is your RAID controller and Disk I/O time. Some of the lower end RAID controllers don't come by default with battery backed write cache. What this means to you is that disk writes are exceptionally slow. You will see long waits generally when powering on while the vmware swap file is created, but also during boot up process as the OS is initializing its paging files and such. You can look at the storage path and storage adapter reports in the performance tab of the client for an idea how long disk latency is. I generally like to see disk latency 20 ms or less.
0
 

Author Comment

by:SBSIAdmin
Comment Utility
Thanks for the response bgoering.  Those sound like some promising suggestions.  It's about 10pm EST for me right now and I may not get to your suggested changes until Monday.
The servers were native builds as opposed to P2V's.
I'll try the memory trick and also get the specs on the RAID controller.  I'm guessing the RAID controller is relatively base since it's the base controller that you can get with the server on the motherboard.
I'm looking in the vSphere Client and don't see where to obtain the disk latency numbers.  On [Performance] I can select [Disk], but I don't see anything for storage path or storage adapter.
I did just peruse the [Events] tab and was reminded of a message that I had seen in the past.
Message from VM1:
Insufficient video RAM. The maximum resolution
of the virtual machine will be limited to 1176x885
at 16 bits per pixel. To use the configured
maximum resolution of 2360x1770 at 16 bits per
pixel, increase the amount of video RAM allocated
to this virtual machine by setting svga.vramSize=
"16708800" in the virtual machine's configuration
file.

I wonder if this could be adding to the reboot times.
0
 
LVL 16

Expert Comment

by:danm66
Comment Utility
Did you set a memory limit (edit settings | resources, click on memory)?  If so, change it back to unlimited.

This sounds more like a memory contention/limitation issue rather than a IO issue.  You might also try uninstalling VMware tools to see if the balloon driver or another component is interfering somehow.  (not sure from your earlier reply if you meant that you hadn't had tools installed or had just thought to check that they were installed)

Are you over-committed on memory?  What is your host memory consumption looking like?
0
 
LVL 28

Expert Comment

by:bgoering
Comment Utility
No, that is just an informational message and does not indicate a problem. If you want to make the message go away take a look at this knowledge base article: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1024990
0
 
LVL 10

Assisted Solution

by:BloodRed
BloodRed earned 167 total points
Comment Utility
Within your Windows VMs, ensure that the "Clear page file on shutdown" security setting is not set to "Enabled".

This setting will cause Windows to zero out the page file on shutdown/reboot, and with a system with a large amount of memory and a correspondingly large page file configured this can take a very long time.  
0
 
LVL 26

Expert Comment

by:lnkevin
Comment Utility
Where are the vms located? Are they on local disks or SAN?
Most of the time, the delay on vms happened at the disk i/o. If you have SAN, try to move your SAN card to a different slots on the higher number. If it's local storage, you may want to check the RAID and HD to ensure everything is OK. Next, run esxcfg-rescan to refresh the connection to your disks.

K
0
Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 

Author Comment

by:SBSIAdmin
Comment Utility
I will admit, I posted this question on behalf of our senior level tech who had been the point person for our VM installs.  He was a newbie as well.  As it turns out, he has left our organization so now our two VM servers are my responsibility now.  Thus the delay since the last posting.
I just did a full reboot on one of our physical boxes.  I wanted to peruse the BIOS because I had also heard that their may be a BIOS setting that needs to be adjusted if the box will be running VM.  I saw a couple entries about memory and Intel virtualization, but nothing jumped out at me about VM per se.
As the machine was posting, I notice a message about there being no battery backup on the RAID controller.  It seemed to indicate that it could be added, but it wasn't installed by default.  There was a previous comment about RAID controllers and battery-backed write cache.  There was also a BIOS setting for enabling write caching.  That's currently disabled and it comes with a warning about potentially losing data if there's a power outage, so I haven't enabled that.  I wondering if I should persue getting the battery module for my RAID controller.
0
 
LVL 16

Expert Comment

by:danm66
Comment Utility
Yeah, not having the battery backed write cache enabled definitely does negatively affect storage performance.
0
 
LVL 28

Expert Comment

by:bgoering
Comment Utility
Yes, if the guests are hosted on the local RAID the addition of battery backed cache will make a huge improvement for disk writes. If the long delay is on the shutdown part of the reboot process, cosider what BloodRed indicated above about the clear pagefile setting - that would add a large amount of time to the shutdown process.

If you shutdown and power off one of these vms, how long does it take to power on and come up?
0
 

Author Comment

by:SBSIAdmin
Comment Utility
It was less than a year ago that we set our page files to clear on all servers as a result of a recommendation from our security consulting firm.  I tried turning that off to see if it would make a difference on the VM servers, but it hasn't.
I was unable to find a specific settings in BIOS that would "enable" the server to be a hypervisor.
I guess that leaves me with the battery backed cache option.  I'll have to obtain the exact model of RAID controller, confirm capabilities, and get a price.
0
 
LVL 16

Expert Comment

by:danm66
Comment Utility
you could give some vm's a full memory reservation and see if that makes a difference, too.  You'll still need to read from disk but it will preclude the need for a swap file so it would be a decent test to see if you're going down the right path before laying out the money.
0
 

Author Comment

by:SBSIAdmin
Comment Utility
I ran a couple time tests tonight to get real numbers.  I rebooted one of the server instances on each of our two ESX boxes tonight.  Both were quite consistent.  11 minutes to shutdown and less than 2 minutes to boot up.  This test was done with clearing pagefile disabled.
Then I reset the GPO to enable clearing of the pagefile on shutdown.  Ran gpupdate /force and rebooted again.  The shutdown and restart times were just about the same.  That leads me to believe that the pagefile isn't significantly impacting the process.
I'm not certain what a "full memory reservation" means.  The physical boxes have 16GB of RAM in them.  One ESX is running 3 virtual servers and the other is running 2.  Each virtual server has been allocated 4GB of RAM.  So each virtual is running 4GB leaving 4 or 8GB for the hypervisor.
I'm kind of leaning back to the battery backed cache.  Maybe the shutdown process is so disk intensive that we feel the effects, but daily usage as a file server or domain controller doesn't push the disk enough to notice it??
0
 
LVL 28

Expert Comment

by:bgoering
Comment Utility
Double check to be sure the GPO update took - take a look at

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management
Value Name: ClearPageFileAtShutdown
Value Type: REG_DWORD

If the Value is 1, the pagefile will be cleared, if 0 the page file will not be cleared.

In any event in the long run to achieve satisfactory performance you will want to be able to configure write-back (rather than write-through) cache on you raid controller. I would definately get that upgrade.

Two minutes sounds reasonable for a startup time.

So far as the full memory reservation - I really don't recommend using that. It is generally better to let ESX manage the memory unless you have very special cases of critical workloads. To set it to test, go into edit settings on your vm, click the resources tab and there will be a place to set a reservation. For a full reservation change it to the amount of memory you have allocated to the vm. What this buys you is that ESX doesn't have to create a swap file to back the ram on that virtual machine. That is pretty much a low overhead activity unless your are serverly memory constrained on your host, and it doesn't sound like that is the case. In any event - the overhead of a swap file isn't really incurred until such time as ESX has to swap out memory pages to disk. When that happens there are two writes to disk, one to zero the area, the other to write the memory contents so that physical ram can be allocated to another host.

Good Luck
0
 

Author Comment

by:SBSIAdmin
Comment Utility
I very much appreciate all of the responses, but unfortunately, none have helped to this point.  My next step is going to be battery-backed cache, but that's going to take some time, purchase, etc.  For the time being I'm going to close this question.  Again, thanks for all the responses.
0
 

Author Closing Comment

by:SBSIAdmin
Comment Utility
Unfortunately, we still have the issue, but I'm not able to continue working on it at this time.  I need to get the proper part number for our hardware, determine the cost, get budget for it, make the purchase, install it, yada, yada.  There's no point in keeping the question open at this time.  Responses were great, but nothing I tried worked.  The last item to try requires more planning.
0

Featured Post

What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

Join & Write a Comment

Suggested Solutions

VM backup deduplication is a method of reducing the amount of storage space needed to save VM backups. In most organizations, VMs contain many duplicate copies of data, such as VMs deployed from the same template, VMs with the same OS, or VMs that h…
Will try to explain how to use the VMware feature TAGs in the VMs and create Veeam Backup Jobs using TAGs. Since this article is too long, I will create second article for the Veeam tasks.
Teach the user how to configure vSphere Replication and how to protect and recover VMs Open vSphere Web Client: Verify vsphere Replication is enabled: Enable vSphere Replication for a virtual machine: Verify replicated VM is created: Recover replica…
This tutorial will walk an individual through the steps necessary to enable the VMware\Hyper-V licensed feature of Backup Exec 2012. In addition, how to add a VMware server and configure a backup job. The first step is to acquire the necessary licen…

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now