Solved

Virtual Machine Memory/Pagefile Issue on ESX server

Posted on 2007-04-09
7
1,253 Views
Last Modified: 2008-01-09
We are having an issue with a Virtual Machine that we have setup at a client site.  It was built in ESX 3 and then converted for a stand-alone system for Virtual Workstation 5 and then back to an ESX 3 server.  I have attached two emails that outline the issue below.  Any help on this is greatly appreciated!
     Email 1:
After shutting down all of our services, the machine still sits at 2.75-2.81GB of Page File usage. To see if this was some sort of memory leak situation, I disabled all of our services, and rebooted the machine. When it came up, the PF usage started at around 600MB and climbed steadily until it hit 2.75GB. I ended up rebooting again, and again, it did the exact same thing. This only leaves 1GB of RAM for our services to use. We tried changing various server settings to coax it into cutting down on it’s memory usage, but nothing really changed.
We did suggest locking the kernel in Physical RAM to prevent it from swapping which did make the server more responsive. It’s more usable than it was before, but still very slow.
I was able to benchmark it’s hard drives and they perform as well as our test server (they are approx. 10k rpm drives). So this system should rock if we can figure out what’s eating memory. At this point, it sounds like some sort of VM issue. I started the copy of the slice we have here (secondary slice at our location as opposed to the clients) and it does not behave this way. Something is different between our server and theirs.
I put all the services back up and they are running a little better. However, this thing is on the hairy edge of being out of memory, so it’s only going to get slower. I’ll check it again in the morning to see how it’s running.

     Email 2: (this was sent after I requested more specific information on the server/system our VM was running on)
Hi Jacob,
Their system is running ESX, and is a six-server cluster.  Each server has 2 quad-core processors, a metric ton of RAM, and is connected to a pretty powerful SAN.  The server we are on has 4 others VMs.  Our VM is set to request 4 virtual processors, one of the others requests 2 and the others 1, but they were not all running when we were there, so there should be plenty of processor.

The big thing we are noticing is the size of our paging file on our own slice.  It is much larger there than on the original slice in our environment.  Looking at the Windows Task Manager, you can see the size of the page file, but none of the processes are using anywhere near that much.  I suspect fairly strongly that it’s being taken up by the VMWare “balloon” process that it uses to help it manage and balance memory on each slice that it has, but I’m not sure why it would behave so differently in their environment vs. ours.

From what I’ve been reading, it might not be a bad idea to have them re-install the VMWare tools on the slice.  Evidently if they are incorrect it can cause this sort of problem, and we did move it from our ESX processor to Workstation and then they imported it back to ESX.  But this is a “maybe yea, maybe nea” kinda thing to try, just noting that it was pointed to by VMWare for someone else that had a similar looking issue.
I’m still looking at other possibilities as well.

0
Comment
Question by:5253h
  • 2
  • 2
7 Comments
 
LVL 57

Expert Comment

by:giltjr
ID: 18882752
Is the page file you are talking about on the host OS or the virtual OS?

How many virtual machines are running on the host OS?
0
 

Author Comment

by:5253h
ID: 18882924
Yes the page file we are referring to is the Virtual Machine page file, and not the ESX server page file.  There are 4 virtual machines running on this ESX server (inlcuding ours).  Here is an excerpt from my lengthy post above,
     "Their system is running ESX, and is a six-server cluster.  Each server has 2 quad-core processors, a metric ton of RAM, and is connected to a pretty powerful SAN.  The server we are on has 4 others VMs.  Our VM is set to request 4 virtual processors, one of the others requests 2 and the others 1, but they were not all running when we were there, so there should be plenty of processor."
     So the system does not appear to be nearing capacity for the number of virtual servers it is hosting, and even on the Guest OS (our Virtual Macine) when we look under task manager the memory that the processes report as using does not add up to the amount of memory that the virtual machine reports the page file is using.
0
 
LVL 57

Expert Comment

by:giltjr
ID: 18883038
Have you run top on the virtual machine to see what could be chewing up memory?

Could something be caching a large file, or a bunch of small files?

What type of applications are you running on the virtual machine?
0
 
LVL 1

Expert Comment

by:wanstor
ID: 18909612
this doesn't sound like a vmware prob, more a prob with the guest os.

what is the guest os, what apps are you trying to run on the guest os? do you need to give it 4 vCPU's, it can degrade performance giving it too many avaliable vCPU's.

incidentally, you dont have any snapshots lurking around do you?  

if this is a windows box, have you tried setting to no pagefile, restarting and then changing back to manual pagefile size?

what kind of san are you attached to?  10k disks in what configuration?   How many other vm's are on that lun?  have you ran iometer?
0
 

Accepted Solution

by:
5253h earned 0 total points
ID: 18919289
Thanks for ytour comments everyone.  Sorry for not getting back sooner with your questons, but I have been working many hours night and day to get this resolved.  I ended up running a VMPerfmon tool on the Guest OS (Windows 2003 Server in this case) and seeing that we were limited to 188 MB's of memory.  I checked this against the log files I had the company where our Guest OS was hosted on their ESX server provide me, and it showed the same.  The system (Guest OS) showed it had 4GB of RAM, but that was not true.  I informed my contact at the client site (where the ESX server is located) about the issue, and were to check the unlimited box to remove the 188 MB limitation.  After this was completed the Guest OS began to run just fine.  We were also able to identify a potential issue with the SAN attached to the ESX server through the log file.  This did not seem to affect us.  Again, thank you all for your assistance.
0

Featured Post

Backup Your Microsoft Windows Server®

Backup all your Microsoft Windows Server – on-premises, in remote locations, in private and hybrid clouds. Your entire Windows Server will be backed up in one easy step with patented, block-level disk imaging. We achieve RTOs (recovery time objectives) as low as 15 seconds.

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
LINUX, CPANEL & WHM 5 25
shadow copies 7 19
Windows server 2003 bootable iso 9 27
Xymon customize http timeout 2 32
It’s 2016. Password authentication should be dead — or at least close to dying. But, unfortunately, it has not traversed Quagga stage yet. Using password authentication is like laundering hotel guest linens with a washboard — it’s Passé.
ADCs have gained traction within the last decade, largely due to increased demand for legacy load balancing appliances to handle more advanced application delivery requirements and improve application performance.
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…
Connecting to an Amazon Linux EC2 Instance from Windows Using PuTTY.

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now