VMWare machines cause linux host to be inaccessible

We have a VMWare host server running Red Hat Enterprise Linux 5 x86_64 on a fairly powerful machine. It has two virtual machines, one Windows Server 2003 and one 2008. Periodically the host server gets very bogged down and the load spikes to around 20, and the server becomes essentially unresponsive.

The server has two Intel Xeon E5405 CPUs running at 2GHz a piece and 4GB RAM.

9567 root       5 -10 1997m  74m  41m S   13  1.9   3630:38 vmware-vmx                                                            
 9670 root       6 -10 2007m  67m  30m D    9  1.7   8162:27 vmware-vmx                                                            
 7555 mysql     15   0  365m  15m 2960 S    7  0.4  28:53.68 mysqld      

The interesting thing there is, neither the CPU nor RAM usage seem all that high. Is there anything I can do to regulate the peak resource usage of the vmware virtual machines, or information I can provide that would be diagnostically useful?


Best Regards,
Martin Schultz
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Check the disk throughput values. I find that processes that are heavily dependent on NFS or local DASD for disk access can exhibit this spiral effect.
Here is a page that details monitoring linux performance issues. I was going to create another one, but this one has what should point out your issue.

WideAreaMediaAuthor Commented:
That makes some sense, but unfortunately my linux administration skills are lacking. If this turns out to be the case, do you have any suggestion for a remedy?
Determine the Perfect Price for Your IT Services

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden with our free interactive tool and use it to determine the right price for your IT services. Download your free eBook now!

The particular subsystem has to be identified and once that is done, we can determine a course of action. The most likely candidate will be disk I/O or NFS performance.

Load is a combination of a lot of factors, the link above will allow you to 'monitor' and try and determine the particular problem.

Another question, what does the VMware Server console show about the resources in use by the individual machines ? (I'm more familiar with ESX but I am assuming since you are running it on top of linux you are running Server or Workstation)
WideAreaMediaAuthor Commented:
I don't know if this will give the information necessary, but here's the output of the commands from that page that talk about disk I/O:

root@Leopard [~]# iostat
Linux 2.6.18-92.1.13.el5 (Leopard.ohioetech.com)        11/13/2008

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.95    0.00    4.54    1.14    0.00   93.36

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda              24.21       407.26       481.86  778986610  921686413
sdb               4.33        36.85       110.02   70481893  210446986

root@Leopard [~]# w
 21:39:10 up 22 days,  3:21,  2 users,  load average: 7.94, 7.27, 5.06
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
martin   pts/0    cpe-76-190-205-7 21:36    0.00s  0.03s  0.01s sshd: martin [priv]
martin   pts/1    cpe-76-190-205-7 Tue18   15:00m  4.11s  0.00s sshd: martin [priv]
root@Leopard [~]# vmstat
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  1 483332  24488   6116 3405164    1    0    28    37    2    2  1  5 93  1  0
root@Leopard [~]#
This looks starved for RAM...

What does 'top' show ? Copy and paste the lines that look like the ones below as well as the first 5 lines of the top processess...

Tasks:  76 total,   1 running,  75 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.2%us,  0.0%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:    776568k total,   685992k used,    90576k free,    70392k buffers
Swap:   763048k total,        0k used,   763048k free,   498460k cached
WideAreaMediaAuthor Commented:
top - 11:36:56 up 22 days, 17:19,  3 users,  load average: 2.12, 2.25, 2.40
Tasks: 238 total,   1 running, 237 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.0%us,  5.0%sy,  0.0%ni, 92.7%id,  1.2%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   4043412k total,  4016640k used,    26772k free,     7012k buffers
Swap:  2040212k total,   603640k used,  1436572k free,  3410072k cached

I could reduce the amount of RAM allocated to each virtual machine down to 1GB or 768 if that would help...
WideAreaMediaAuthor Commented:
top - 11:39:06 up 22 days, 17:21,  3 users,  load average: 2.81, 2.43, 2.44
Tasks: 242 total,   1 running, 241 sleeping,   0 stopped,   0 zombie
Cpu(s):  3.4%us, 28.4%sy,  0.0%ni, 57.1%id, 11.0%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:   4043412k total,  4018868k used,    24544k free,     4356k buffers
Swap:  2040212k total,   603640k used,  1436572k free,  3324600k cached
Yes, that is exactly what I would recommend. You are running 603mb into swap, which means all RAM is gone, and it is swapping nonstop. Swapping is an intense i/o operation which is probably leading to disk i/o congestion. Processes are getting 'stuck' being swapped out, and this is causing the load on the machine to go spiraling out of control.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Does this machine have more than 4gb of ram ? You mention you are running the 64bit version, I would make sure that it was addressing everything it should.

I've seen machines after certain patches that for one reason or another are 'blind' to the 4gb+ memory space. If that is the case here, you may think you have lets say 8gb of ram, but with the OS only detecting 4gb you are very heavily oversubscribing the ram.
WideAreaMediaAuthor Commented:
The machine has 4GB RAM and is running a 64Bit flavor of Red Hat. I reduced the amount of real memory allocated to one of the virtual machines and I've seen a significant decrease in the frequency of these issues. I'll decrease the other next time I need to take it offline for updates. Thanks frashii for your help, patience, and continued monitoring of this question. The link you provided with the performance diagnostic commands will provide for great basic lessons for my linux administration skills :)

Best Regards,
Martin Schultz
WideAreaMediaAuthor Commented:
Thanks frashii! Much appreciated. If I could give you more points, I would :)
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.