• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 770
  • Last Modified:

VMWare machines cause linux host to be inaccessible

We have a VMWare host server running Red Hat Enterprise Linux 5 x86_64 on a fairly powerful machine. It has two virtual machines, one Windows Server 2003 and one 2008. Periodically the host server gets very bogged down and the load spikes to around 20, and the server becomes essentially unresponsive.

The server has two Intel Xeon E5405 CPUs running at 2GHz a piece and 4GB RAM.

9567 root       5 -10 1997m  74m  41m S   13  1.9   3630:38 vmware-vmx                                                            
 9670 root       6 -10 2007m  67m  30m D    9  1.7   8162:27 vmware-vmx                                                            
 7555 mysql     15   0  365m  15m 2960 S    7  0.4  28:53.68 mysqld      

The interesting thing there is, neither the CPU nor RAM usage seem all that high. Is there anything I can do to regulate the peak resource usage of the vmware virtual machines, or information I can provide that would be diagnostically useful?

Thanks!

Best Regards,
Martin Schultz
0
WideAreaMedia
Asked:
WideAreaMedia
  • 6
  • 6
1 Solution
 
frashiiCommented:
Check the disk throughput values. I find that processes that are heavily dependent on NFS or local DASD for disk access can exhibit this spiral effect.
0
 
frashiiCommented:
Here is a page that details monitoring linux performance issues. I was going to create another one, but this one has what should point out your issue.

http://www.cyberciti.biz/faq/linux-performance-tools-to-troubleshoot-problem/
0
 
WideAreaMediaAuthor Commented:
That makes some sense, but unfortunately my linux administration skills are lacking. If this turns out to be the case, do you have any suggestion for a remedy?
0
Nothing ever in the clear!

This technical paper will help you implement VMware’s VM encryption as well as implement Veeam encryption which together will achieve the nothing ever in the clear goal. If a bad guy steals VMs, backups or traffic they get nothing.

 
frashiiCommented:
The particular subsystem has to be identified and once that is done, we can determine a course of action. The most likely candidate will be disk I/O or NFS performance.

Load is a combination of a lot of factors, the link above will allow you to 'monitor' and try and determine the particular problem.

Another question, what does the VMware Server console show about the resources in use by the individual machines ? (I'm more familiar with ESX but I am assuming since you are running it on top of linux you are running Server or Workstation)
0
 
WideAreaMediaAuthor Commented:
I don't know if this will give the information necessary, but here's the output of the commands from that page that talk about disk I/O:

root@Leopard [~]# iostat
Linux 2.6.18-92.1.13.el5 (Leopard.ohioetech.com)        11/13/2008

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.95    0.00    4.54    1.14    0.00   93.36

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda              24.21       407.26       481.86  778986610  921686413
sdb               4.33        36.85       110.02   70481893  210446986

root@Leopard [~]# w
 21:39:10 up 22 days,  3:21,  2 users,  load average: 7.94, 7.27, 5.06
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
martin   pts/0    cpe-76-190-205-7 21:36    0.00s  0.03s  0.01s sshd: martin [priv]
martin   pts/1    cpe-76-190-205-7 Tue18   15:00m  4.11s  0.00s sshd: martin [priv]
root@Leopard [~]# vmstat
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  1 483332  24488   6116 3405164    1    0    28    37    2    2  1  5 93  1  0
root@Leopard [~]#
0
 
frashiiCommented:
This looks starved for RAM...

What does 'top' show ? Copy and paste the lines that look like the ones below as well as the first 5 lines of the top processess...

Tasks:  76 total,   1 running,  75 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.2%us,  0.0%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:    776568k total,   685992k used,    90576k free,    70392k buffers
Swap:   763048k total,        0k used,   763048k free,   498460k cached
0
 
WideAreaMediaAuthor Commented:
top - 11:36:56 up 22 days, 17:19,  3 users,  load average: 2.12, 2.25, 2.40
Tasks: 238 total,   1 running, 237 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.0%us,  5.0%sy,  0.0%ni, 92.7%id,  1.2%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   4043412k total,  4016640k used,    26772k free,     7012k buffers
Swap:  2040212k total,   603640k used,  1436572k free,  3410072k cached

I could reduce the amount of RAM allocated to each virtual machine down to 1GB or 768 if that would help...
0
 
WideAreaMediaAuthor Commented:
top - 11:39:06 up 22 days, 17:21,  3 users,  load average: 2.81, 2.43, 2.44
Tasks: 242 total,   1 running, 241 sleeping,   0 stopped,   0 zombie
Cpu(s):  3.4%us, 28.4%sy,  0.0%ni, 57.1%id, 11.0%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:   4043412k total,  4018868k used,    24544k free,     4356k buffers
Swap:  2040212k total,   603640k used,  1436572k free,  3324600k cached
0
 
frashiiCommented:
Yes, that is exactly what I would recommend. You are running 603mb into swap, which means all RAM is gone, and it is swapping nonstop. Swapping is an intense i/o operation which is probably leading to disk i/o congestion. Processes are getting 'stuck' being swapped out, and this is causing the load on the machine to go spiraling out of control.


0
 
frashiiCommented:
Does this machine have more than 4gb of ram ? You mention you are running the 64bit version, I would make sure that it was addressing everything it should.

I've seen machines after certain patches that for one reason or another are 'blind' to the 4gb+ memory space. If that is the case here, you may think you have lets say 8gb of ram, but with the OS only detecting 4gb you are very heavily oversubscribing the ram.
0
 
WideAreaMediaAuthor Commented:
The machine has 4GB RAM and is running a 64Bit flavor of Red Hat. I reduced the amount of real memory allocated to one of the virtual machines and I've seen a significant decrease in the frequency of these issues. I'll decrease the other next time I need to take it offline for updates. Thanks frashii for your help, patience, and continued monitoring of this question. The link you provided with the performance diagnostic commands will provide for great basic lessons for my linux administration skills :)

Best Regards,
Martin Schultz
0
 
WideAreaMediaAuthor Commented:
Thanks frashii! Much appreciated. If I could give you more points, I would :)
0

Featured Post

Cyber Threats to Small Businesses (Part 1)

This past May, Webroot surveyed more than 600 IT decision-makers at medium-sized companies to see how these small businesses perceived new threats facing their organizations.  Read what Webroot CISO, Gary Hayslip, has to say about the survey in part 1 of this 2-part blog series.

  • 6
  • 6
Tackle projects and never again get stuck behind a technical roadblock.
Join Now