Link to home
Start Free TrialLog in
Avatar of WideAreaMedia
WideAreaMediaFlag for Afghanistan

asked on

VMWare machines cause linux host to be inaccessible

We have a VMWare host server running Red Hat Enterprise Linux 5 x86_64 on a fairly powerful machine. It has two virtual machines, one Windows Server 2003 and one 2008. Periodically the host server gets very bogged down and the load spikes to around 20, and the server becomes essentially unresponsive.

The server has two Intel Xeon E5405 CPUs running at 2GHz a piece and 4GB RAM.

9567 root       5 -10 1997m  74m  41m S   13  1.9   3630:38 vmware-vmx                                                            
 9670 root       6 -10 2007m  67m  30m D    9  1.7   8162:27 vmware-vmx                                                            
 7555 mysql     15   0  365m  15m 2960 S    7  0.4  28:53.68 mysqld      

The interesting thing there is, neither the CPU nor RAM usage seem all that high. Is there anything I can do to regulate the peak resource usage of the vmware virtual machines, or information I can provide that would be diagnostically useful?

Thanks!

Best Regards,
Martin Schultz
Avatar of frashii
frashii

Check the disk throughput values. I find that processes that are heavily dependent on NFS or local DASD for disk access can exhibit this spiral effect.
Here is a page that details monitoring linux performance issues. I was going to create another one, but this one has what should point out your issue.

http://www.cyberciti.biz/faq/linux-performance-tools-to-troubleshoot-problem/
Avatar of WideAreaMedia

ASKER

That makes some sense, but unfortunately my linux administration skills are lacking. If this turns out to be the case, do you have any suggestion for a remedy?
The particular subsystem has to be identified and once that is done, we can determine a course of action. The most likely candidate will be disk I/O or NFS performance.

Load is a combination of a lot of factors, the link above will allow you to 'monitor' and try and determine the particular problem.

Another question, what does the VMware Server console show about the resources in use by the individual machines ? (I'm more familiar with ESX but I am assuming since you are running it on top of linux you are running Server or Workstation)
I don't know if this will give the information necessary, but here's the output of the commands from that page that talk about disk I/O:

root@Leopard [~]# iostat
Linux 2.6.18-92.1.13.el5 (Leopard.ohioetech.com)        11/13/2008

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.95    0.00    4.54    1.14    0.00   93.36

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda              24.21       407.26       481.86  778986610  921686413
sdb               4.33        36.85       110.02   70481893  210446986

root@Leopard [~]# w
 21:39:10 up 22 days,  3:21,  2 users,  load average: 7.94, 7.27, 5.06
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
martin   pts/0    cpe-76-190-205-7 21:36    0.00s  0.03s  0.01s sshd: martin [priv]
martin   pts/1    cpe-76-190-205-7 Tue18   15:00m  4.11s  0.00s sshd: martin [priv]
root@Leopard [~]# vmstat
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  1 483332  24488   6116 3405164    1    0    28    37    2    2  1  5 93  1  0
root@Leopard [~]#
This looks starved for RAM...

What does 'top' show ? Copy and paste the lines that look like the ones below as well as the first 5 lines of the top processess...

Tasks:  76 total,   1 running,  75 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.2%us,  0.0%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:    776568k total,   685992k used,    90576k free,    70392k buffers
Swap:   763048k total,        0k used,   763048k free,   498460k cached
top - 11:36:56 up 22 days, 17:19,  3 users,  load average: 2.12, 2.25, 2.40
Tasks: 238 total,   1 running, 237 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.0%us,  5.0%sy,  0.0%ni, 92.7%id,  1.2%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   4043412k total,  4016640k used,    26772k free,     7012k buffers
Swap:  2040212k total,   603640k used,  1436572k free,  3410072k cached

I could reduce the amount of RAM allocated to each virtual machine down to 1GB or 768 if that would help...
top - 11:39:06 up 22 days, 17:21,  3 users,  load average: 2.81, 2.43, 2.44
Tasks: 242 total,   1 running, 241 sleeping,   0 stopped,   0 zombie
Cpu(s):  3.4%us, 28.4%sy,  0.0%ni, 57.1%id, 11.0%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:   4043412k total,  4018868k used,    24544k free,     4356k buffers
Swap:  2040212k total,   603640k used,  1436572k free,  3324600k cached
ASKER CERTIFIED SOLUTION
Avatar of frashii
frashii

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Does this machine have more than 4gb of ram ? You mention you are running the 64bit version, I would make sure that it was addressing everything it should.

I've seen machines after certain patches that for one reason or another are 'blind' to the 4gb+ memory space. If that is the case here, you may think you have lets say 8gb of ram, but with the OS only detecting 4gb you are very heavily oversubscribing the ram.
The machine has 4GB RAM and is running a 64Bit flavor of Red Hat. I reduced the amount of real memory allocated to one of the virtual machines and I've seen a significant decrease in the frequency of these issues. I'll decrease the other next time I need to take it offline for updates. Thanks frashii for your help, patience, and continued monitoring of this question. The link you provided with the performance diagnostic commands will provide for great basic lessons for my linux administration skills :)

Best Regards,
Martin Schultz
Thanks frashii! Much appreciated. If I could give you more points, I would :)