Virtual Machine Memory/Pagefile Issue on ESX server

We are having an issue with a Virtual Machine that we have setup at a client site.  It was built in ESX 3 and then converted for a stand-alone system for Virtual Workstation 5 and then back to an ESX 3 server.  I have attached two emails that outline the issue below.  Any help on this is greatly appreciated!
     Email 1:
After shutting down all of our services, the machine still sits at 2.75-2.81GB of Page File usage. To see if this was some sort of memory leak situation, I disabled all of our services, and rebooted the machine. When it came up, the PF usage started at around 600MB and climbed steadily until it hit 2.75GB. I ended up rebooting again, and again, it did the exact same thing. This only leaves 1GB of RAM for our services to use. We tried changing various server settings to coax it into cutting down on it’s memory usage, but nothing really changed.
We did suggest locking the kernel in Physical RAM to prevent it from swapping which did make the server more responsive. It’s more usable than it was before, but still very slow.
I was able to benchmark it’s hard drives and they perform as well as our test server (they are approx. 10k rpm drives). So this system should rock if we can figure out what’s eating memory. At this point, it sounds like some sort of VM issue. I started the copy of the slice we have here (secondary slice at our location as opposed to the clients) and it does not behave this way. Something is different between our server and theirs.
I put all the services back up and they are running a little better. However, this thing is on the hairy edge of being out of memory, so it’s only going to get slower. I’ll check it again in the morning to see how it’s running.

     Email 2: (this was sent after I requested more specific information on the server/system our VM was running on)
Hi Jacob,
Their system is running ESX, and is a six-server cluster.  Each server has 2 quad-core processors, a metric ton of RAM, and is connected to a pretty powerful SAN.  The server we are on has 4 others VMs.  Our VM is set to request 4 virtual processors, one of the others requests 2 and the others 1, but they were not all running when we were there, so there should be plenty of processor.

The big thing we are noticing is the size of our paging file on our own slice.  It is much larger there than on the original slice in our environment.  Looking at the Windows Task Manager, you can see the size of the page file, but none of the processes are using anywhere near that much.  I suspect fairly strongly that it’s being taken up by the VMWare “balloon” process that it uses to help it manage and balance memory on each slice that it has, but I’m not sure why it would behave so differently in their environment vs. ours.

From what I’ve been reading, it might not be a bad idea to have them re-install the VMWare tools on the slice.  Evidently if they are incorrect it can cause this sort of problem, and we did move it from our ESX processor to Workstation and then they imported it back to ESX.  But this is a “maybe yea, maybe nea” kinda thing to try, just noting that it was pointed to by VMWare for someone else that had a similar looking issue.
I’m still looking at other possibilities as well.

Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Is the page file you are talking about on the host OS or the virtual OS?

How many virtual machines are running on the host OS?
5253hAuthor Commented:
Yes the page file we are referring to is the Virtual Machine page file, and not the ESX server page file.  There are 4 virtual machines running on this ESX server (inlcuding ours).  Here is an excerpt from my lengthy post above,
     "Their system is running ESX, and is a six-server cluster.  Each server has 2 quad-core processors, a metric ton of RAM, and is connected to a pretty powerful SAN.  The server we are on has 4 others VMs.  Our VM is set to request 4 virtual processors, one of the others requests 2 and the others 1, but they were not all running when we were there, so there should be plenty of processor."
     So the system does not appear to be nearing capacity for the number of virtual servers it is hosting, and even on the Guest OS (our Virtual Macine) when we look under task manager the memory that the processes report as using does not add up to the amount of memory that the virtual machine reports the page file is using.
Have you run top on the virtual machine to see what could be chewing up memory?

Could something be caching a large file, or a bunch of small files?

What type of applications are you running on the virtual machine?
this doesn't sound like a vmware prob, more a prob with the guest os.

what is the guest os, what apps are you trying to run on the guest os? do you need to give it 4 vCPU's, it can degrade performance giving it too many avaliable vCPU's.

incidentally, you dont have any snapshots lurking around do you?  

if this is a windows box, have you tried setting to no pagefile, restarting and then changing back to manual pagefile size?

what kind of san are you attached to?  10k disks in what configuration?   How many other vm's are on that lun?  have you ran iometer?
5253hAuthor Commented:
Thanks for ytour comments everyone.  Sorry for not getting back sooner with your questons, but I have been working many hours night and day to get this resolved.  I ended up running a VMPerfmon tool on the Guest OS (Windows 2003 Server in this case) and seeing that we were limited to 188 MB's of memory.  I checked this against the log files I had the company where our Guest OS was hosted on their ESX server provide me, and it showed the same.  The system (Guest OS) showed it had 4GB of RAM, but that was not true.  I informed my contact at the client site (where the ESX server is located) about the issue, and were to check the unlimited box to remove the 188 MB limitation.  After this was completed the Guest OS began to run just fine.  We were also able to identify a potential issue with the SAN attached to the ESX server through the log file.  This did not seem to affect us.  Again, thank you all for your assistance.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.