Solved

VM virtual servers excessive reboot times

Posted on 2010-09-17
17
838 Views
Last Modified: 2012-08-13
We're relatively new to VMWare.  We have two physical boxes running that latest version of free ESXi.  The servers are new HP Proliant ML350 G6's configured with RAID-5.  Plenty of horsepower I would think.  32GB RAM.  Each physical box is hosting 3 virtual servers, none of which have excessive load; i.e. they're basically file servers.  The virtual servers are all running Win2K3 R2 and each is allocated 4GB of RAM.

We've noticed that if we initiate a Shutdown/Restart on a virtual server, it takes upwards of 30 minutes to finish the reboot cycle.  Compare this with a similar Windows server, non-VM, that would take 4-5 minutes.  If we pull up the console via the vSphere Client, we'll stare at a gray screen for 20+ minutes.  Often times we end up powering off the instance in question just because it takes too long to wait for it to finish.  Both physical boxes and all six virtual servers are exhibiting these symptoms.  I'm thinking we must be missing some VM-101 setting that we're unaware of.

Any ideas??
0
Comment
Question by:SBSIAdmin
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 5
  • 3
  • +2
17 Comments
 
LVL 28

Accepted Solution

by:
bgoering earned 333 total points
ID: 33706197
Have you installed the vmware tools in all of your guests?
0
 

Author Comment

by:SBSIAdmin
ID: 33706201
Yes.  We originally thought that was it, but no luck.
0
 
LVL 28

Assisted Solution

by:bgoering
bgoering earned 333 total points
ID: 33706259
Tell me a bit more about your environment. How were these vms created? Were they a P2V from a physical box? Have you adjusted the RAM size? For an experiment drop one to 1 or 2GB of RAM and see if it boots faster. If it does adjust it back to the 4GB and it should be ok going forward. It is a bit inexplicable why that works, but it has been an occassional bug in VMware.

Another thing to look at is your RAID controller and Disk I/O time. Some of the lower end RAID controllers don't come by default with battery backed write cache. What this means to you is that disk writes are exceptionally slow. You will see long waits generally when powering on while the vmware swap file is created, but also during boot up process as the OS is initializing its paging files and such. You can look at the storage path and storage adapter reports in the performance tab of the client for an idea how long disk latency is. I generally like to see disk latency 20 ms or less.
0
Connect further...control easier

With the ATEN CE624, you can now enjoy a high-quality visual experience powered by HDBaseT technology and the convenience of a single Cat6 cable to transmit uncompressed video with zero latency and multi-streaming for dual-view applications where remote access is required.

 

Author Comment

by:SBSIAdmin
ID: 33706338
Thanks for the response bgoering.  Those sound like some promising suggestions.  It's about 10pm EST for me right now and I may not get to your suggested changes until Monday.
The servers were native builds as opposed to P2V's.
I'll try the memory trick and also get the specs on the RAID controller.  I'm guessing the RAID controller is relatively base since it's the base controller that you can get with the server on the motherboard.
I'm looking in the vSphere Client and don't see where to obtain the disk latency numbers.  On [Performance] I can select [Disk], but I don't see anything for storage path or storage adapter.
I did just peruse the [Events] tab and was reminded of a message that I had seen in the past.
Message from VM1:
Insufficient video RAM. The maximum resolution
of the virtual machine will be limited to 1176x885
at 16 bits per pixel. To use the configured
maximum resolution of 2360x1770 at 16 bits per
pixel, increase the amount of video RAM allocated
to this virtual machine by setting svga.vramSize=
"16708800" in the virtual machine's configuration
file.

I wonder if this could be adding to the reboot times.
0
 
LVL 16

Expert Comment

by:danm66
ID: 33706457
Did you set a memory limit (edit settings | resources, click on memory)?  If so, change it back to unlimited.

This sounds more like a memory contention/limitation issue rather than a IO issue.  You might also try uninstalling VMware tools to see if the balloon driver or another component is interfering somehow.  (not sure from your earlier reply if you meant that you hadn't had tools installed or had just thought to check that they were installed)

Are you over-committed on memory?  What is your host memory consumption looking like?
0
 
LVL 28

Expert Comment

by:bgoering
ID: 33707752
No, that is just an informational message and does not indicate a problem. If you want to make the message go away take a look at this knowledge base article: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1024990
0
 
LVL 10

Assisted Solution

by:BloodRed
BloodRed earned 167 total points
ID: 33707797
Within your Windows VMs, ensure that the "Clear page file on shutdown" security setting is not set to "Enabled".

This setting will cause Windows to zero out the page file on shutdown/reboot, and with a system with a large amount of memory and a correspondingly large page file configured this can take a very long time.  
0
 
LVL 26

Expert Comment

by:lnkevin
ID: 33720538
Where are the vms located? Are they on local disks or SAN?
Most of the time, the delay on vms happened at the disk i/o. If you have SAN, try to move your SAN card to a different slots on the higher number. If it's local storage, you may want to check the RAID and HD to ensure everything is OK. Next, run esxcfg-rescan to refresh the connection to your disks.

K
0
 

Author Comment

by:SBSIAdmin
ID: 33784513
I will admit, I posted this question on behalf of our senior level tech who had been the point person for our VM installs.  He was a newbie as well.  As it turns out, he has left our organization so now our two VM servers are my responsibility now.  Thus the delay since the last posting.
I just did a full reboot on one of our physical boxes.  I wanted to peruse the BIOS because I had also heard that their may be a BIOS setting that needs to be adjusted if the box will be running VM.  I saw a couple entries about memory and Intel virtualization, but nothing jumped out at me about VM per se.
As the machine was posting, I notice a message about there being no battery backup on the RAID controller.  It seemed to indicate that it could be added, but it wasn't installed by default.  There was a previous comment about RAID controllers and battery-backed write cache.  There was also a BIOS setting for enabling write caching.  That's currently disabled and it comes with a warning about potentially losing data if there's a power outage, so I haven't enabled that.  I wondering if I should persue getting the battery module for my RAID controller.
0
 
LVL 16

Expert Comment

by:danm66
ID: 33784617
Yeah, not having the battery backed write cache enabled definitely does negatively affect storage performance.
0
 
LVL 28

Expert Comment

by:bgoering
ID: 33784697
Yes, if the guests are hosted on the local RAID the addition of battery backed cache will make a huge improvement for disk writes. If the long delay is on the shutdown part of the reboot process, cosider what BloodRed indicated above about the clear pagefile setting - that would add a large amount of time to the shutdown process.

If you shutdown and power off one of these vms, how long does it take to power on and come up?
0
 

Author Comment

by:SBSIAdmin
ID: 33817738
It was less than a year ago that we set our page files to clear on all servers as a result of a recommendation from our security consulting firm.  I tried turning that off to see if it would make a difference on the VM servers, but it hasn't.
I was unable to find a specific settings in BIOS that would "enable" the server to be a hypervisor.
I guess that leaves me with the battery backed cache option.  I'll have to obtain the exact model of RAID controller, confirm capabilities, and get a price.
0
 
LVL 16

Expert Comment

by:danm66
ID: 33819139
you could give some vm's a full memory reservation and see if that makes a difference, too.  You'll still need to read from disk but it will preclude the need for a swap file so it would be a decent test to see if you're going down the right path before laying out the money.
0
 

Author Comment

by:SBSIAdmin
ID: 33837049
I ran a couple time tests tonight to get real numbers.  I rebooted one of the server instances on each of our two ESX boxes tonight.  Both were quite consistent.  11 minutes to shutdown and less than 2 minutes to boot up.  This test was done with clearing pagefile disabled.
Then I reset the GPO to enable clearing of the pagefile on shutdown.  Ran gpupdate /force and rebooted again.  The shutdown and restart times were just about the same.  That leads me to believe that the pagefile isn't significantly impacting the process.
I'm not certain what a "full memory reservation" means.  The physical boxes have 16GB of RAM in them.  One ESX is running 3 virtual servers and the other is running 2.  Each virtual server has been allocated 4GB of RAM.  So each virtual is running 4GB leaving 4 or 8GB for the hypervisor.
I'm kind of leaning back to the battery backed cache.  Maybe the shutdown process is so disk intensive that we feel the effects, but daily usage as a file server or domain controller doesn't push the disk enough to notice it??
0
 
LVL 28

Expert Comment

by:bgoering
ID: 33837119
Double check to be sure the GPO update took - take a look at

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management
Value Name: ClearPageFileAtShutdown
Value Type: REG_DWORD

If the Value is 1, the pagefile will be cleared, if 0 the page file will not be cleared.

In any event in the long run to achieve satisfactory performance you will want to be able to configure write-back (rather than write-through) cache on you raid controller. I would definately get that upgrade.

Two minutes sounds reasonable for a startup time.

So far as the full memory reservation - I really don't recommend using that. It is generally better to let ESX manage the memory unless you have very special cases of critical workloads. To set it to test, go into edit settings on your vm, click the resources tab and there will be a place to set a reservation. For a full reservation change it to the amount of memory you have allocated to the vm. What this buys you is that ESX doesn't have to create a swap file to back the ram on that virtual machine. That is pretty much a low overhead activity unless your are serverly memory constrained on your host, and it doesn't sound like that is the case. In any event - the overhead of a swap file isn't really incurred until such time as ESX has to swap out memory pages to disk. When that happens there are two writes to disk, one to zero the area, the other to write the memory contents so that physical ram can be allocated to another host.

Good Luck
0
 

Author Comment

by:SBSIAdmin
ID: 34161083
I very much appreciate all of the responses, but unfortunately, none have helped to this point.  My next step is going to be battery-backed cache, but that's going to take some time, purchase, etc.  For the time being I'm going to close this question.  Again, thanks for all the responses.
0
 

Author Closing Comment

by:SBSIAdmin
ID: 34161111
Unfortunately, we still have the issue, but I'm not able to continue working on it at this time.  I need to get the proper part number for our hardware, determine the cost, get budget for it, make the purchase, install it, yada, yada.  There's no point in keeping the question open at this time.  Responses were great, but nothing I tried worked.  The last item to try requires more planning.
0

Featured Post

Windows Server 2016: All you need to know

Learn about Hyper-V features that increase functionality and usability of Microsoft Windows Server 2016. Also, throughout this eBook, you’ll find some basic PowerShell examples that will help you leverage the scripts in your environments!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article will show you how to create an ISO CD-ROM/DVD-ROM image (*.iso), and MD5 checksum signature, for use with VMware vSphere Hypervisor 6.5 (ESXi 6.5). It's a good idea to compare checksums, because many installations fail because of a corr…
In this article, I show you step by step with screenshots to assist you - HOW TO: Deploy and Install the VMware vCenter Server Appliance 6.5 (VCSA 6.5), with some helpful tips along the way.
This Micro Tutorial walks you through using a remote console to access a server and install ESXi 5.1. This example is showing remote access and installation using a Dell server. The hypervisor is the very first component of your virtual infrastructu…
This video shows you how to use a vSphere client to connect to your ESX host as the root user. Demonstrates the basic connection of bypassing certification set up. Demonstrates how to access the traditional view to begin managing your virtual mac…

732 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question