asked on

VMWare machines cause linux host to be inaccessible

We have a VMWare host server running Red Hat Enterprise Linux 5 x86_64 on a fairly powerful machine. It has two virtual machines, one Windows Server 2003 and one 2008. Periodically the host server gets very bogged down and the load spikes to around 20, and the server becomes essentially unresponsive.

The server has two Intel Xeon E5405 CPUs running at 2GHz a piece and 4GB RAM.

9567 root 5 -10 1997m 74m 41m S 13 1.9 3630:38 vmware-vmx
9670 root 6 -10 2007m 67m 30m D 9 1.7 8162:27 vmware-vmx
7555 mysql 15 0 365m 15m 2960 S 7 0.4 28:53.68 mysqld

The interesting thing there is, neither the CPU nor RAM usage seem all that high. Is there anything I can do to regulate the peak resource usage of the vmware virtual machines, or information I can provide that would be diagnostically useful?

Thanks!

Best Regards,
Martin Schultz

frashii

Check the disk throughput values. I find that processes that are heavily dependent on NFS or local DASD for disk access can exhibit this spiral effect.

frashii

Here is a page that details monitoring linux performance issues. I was going to create another one, but this one has what should point out your issue.

http://www.cyberciti.biz/faq/linux-performance-tools-to-troubleshoot-problem/

WideAreaMedia

ASKER

That makes some sense, but unfortunately my linux administration skills are lacking. If this turns out to be the case, do you have any suggestion for a remedy?

frashii

The particular subsystem has to be identified and once that is done, we can determine a course of action. The most likely candidate will be disk I/O or NFS performance.

Load is a combination of a lot of factors, the link above will allow you to 'monitor' and try and determine the particular problem.

Another question, what does the VMware Server console show about the resources in use by the individual machines ? (I'm more familiar with ESX but I am assuming since you are running it on top of linux you are running Server or Workstation)

WideAreaMedia

ASKER

I don't know if this will give the information necessary, but here's the output of the commands from that page that talk about disk I/O:

root@Leopard [~]# iostat
Linux 2.6.18-92.1.13.el5 (Leopard.ohioetech.com) 11/13/2008

avg-cpu: %user %nice %system %iowait %steal %idle
0.95 0.00 4.54 1.14 0.00 93.36

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 24.21 407.26 481.86 778986610 921686413
sdb 4.33 36.85 110.02 70481893 210446986

root@Leopard [~]# w
21:39:10 up 22 days, 3:21, 2 users, load average: 7.94, 7.27, 5.06
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
martin pts/0 cpe-76-190-205-7 21:36 0.00s 0.03s 0.01s sshd: martin [priv]
martin pts/1 cpe-76-190-205-7 Tue18 15:00m 4.11s 0.00s sshd: martin [priv]
root@Leopard [~]# vmstat
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 1 483332 24488 6116 3405164 1 0 28 37 2 2 1 5 93 1 0
root@Leopard [~]#

frashii

This looks starved for RAM...

What does 'top' show ? Copy and paste the lines that look like the ones below as well as the first 5 lines of the top processess...

Tasks: 76 total, 1 running, 75 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.2%us, 0.0%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 776568k total, 685992k used, 90576k free, 70392k buffers
Swap: 763048k total, 0k used, 763048k free, 498460k cached

WideAreaMedia

ASKER

top - 11:36:56 up 22 days, 17:19, 3 users, load average: 2.12, 2.25, 2.40
Tasks: 238 total, 1 running, 237 sleeping, 0 stopped, 0 zombie
Cpu(s): 1.0%us, 5.0%sy, 0.0%ni, 92.7%id, 1.2%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 4043412k total, 4016640k used, 26772k free, 7012k buffers
Swap: 2040212k total, 603640k used, 1436572k free, 3410072k cached

I could reduce the amount of RAM allocated to each virtual machine down to 1GB or 768 if that would help...

WideAreaMedia

ASKER

top - 11:39:06 up 22 days, 17:21, 3 users, load average: 2.81, 2.43, 2.44
Tasks: 242 total, 1 running, 241 sleeping, 0 stopped, 0 zombie
Cpu(s): 3.4%us, 28.4%sy, 0.0%ni, 57.1%id, 11.0%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 4043412k total, 4018868k used, 24544k free, 4356k buffers
Swap: 2040212k total, 603640k used, 1436572k free, 3324600k cached

ASKER CERTIFIED SOLUTION

frashii

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

frashii

Does this machine have more than 4gb of ram ? You mention you are running the 64bit version, I would make sure that it was addressing everything it should.

I've seen machines after certain patches that for one reason or another are 'blind' to the 4gb+ memory space. If that is the case here, you may think you have lets say 8gb of ram, but with the OS only detecting 4gb you are very heavily oversubscribing the ram.

WideAreaMedia

ASKER

The machine has 4GB RAM and is running a 64Bit flavor of Red Hat. I reduced the amount of real memory allocated to one of the virtual machines and I've seen a significant decrease in the frequency of these issues. I'll decrease the other next time I need to take it offline for updates. Thanks frashii for your help, patience, and continued monitoring of this question. The link you provided with the performance diagnostic commands will provide for great basic lessons for my linux administration skills :)

Best Regards,
Martin Schultz

WideAreaMedia

ASKER

Thanks frashii! Much appreciated. If I could give you more points, I would :)