Solved

Need help identifying bottleneck(s)

Posted on 2003-12-09
13
497 Views
Last Modified: 2013-12-15
I need help identifying the bottleneck(s) on a rh linux smp server. The server has 2x2.4 xeons with 12G ram.  The system load is over 50 most of the day, and at certain times cpu idle drops to zero and things noticably drag. The server runs several iterations (400-500) of our custom medical program.
Here is a page of "vmstat 3" output to get started:
 procs                      memory      swap          io     system      cpu
 r  b  w   swpd   free    buff      cache      si   so    bi    bo    in     cs      us sy id
23  0  1  13456  25040 200832 9269604   0    3   235   713  633  8226 19 62 19
 1  0  0  13480  25300 200844 9270584   0    8   331   173  624 10455 24 47 30
 9  0  0  13488  28392 200860 9270612   0    3    12   240  547  8453 13 41 46
141  0  1  13504  28976 200880 9270668 0    5    24   263  496 10598 15 27 59
23  0  0  13520  29032 200880 9270712   0    5    15     5  542  9200 21 64 15
142  0  1  13520  29692 200912 9270808  0    0    36   333  609 10152 16 40 44
24  1  0  13536  28024 200928 9271136   0    5   109   365  611 10452 24 49 28
35  0  0  13544  26976 200928 9271488   0    3   117     3  592  8272 20 71  9
160  0  0  13556  30392 200936 9272136  0    4   216   157  617 10410 26 72  2
38  0  0  13556  29604 200952 9272560    0    0   152     0  577  7923 21 76  2
49  0  0  13556  31708 200976 9272752    0    0    67   316  596 10350 25 57 19
24  0  0  13556  31840 200988 9273540    0    0   263   688  596  8123 17 65 18
29  1  0  13556  31672 200988 9273756    0    0    72     0  595 10512 21 43 36
53  0  0  13564  25988 200764 9275976    0    3   661   288  660  8257 21 67 12
23  0  0  13564  28140 200760 9275988    0    0     3     0  572 10397 20 59 21
33  0  0  13564  23468 200788 9276100    0    0    43   239  570  8104 21 67 13
24  0  1  13564  32396 200816 9262788    0    0    37   187  522 10350 26 53 21
63  0  0  13564  37076 200816 9263096    0    0   103     0  519  8483 18 66 16
19  1  0  13564  37716 200868 9264544    0    0   491   624  590 10279 22 62 16
 
0
Comment
Question by:medent
  • 7
  • 6
13 Comments
 
LVL 9

Expert Comment

by:majorwoo
ID: 9905173
are these applications accessing the disk alot, and what kind of disc is it? Have you performed any optomization on the disc IO?

If there are hyperthreading Xeon's is hyperthreading turned on or off?
0
 

Author Comment

by:medent
ID: 9905361
1. Depending on how the app is used, the disk access will vary greatly. For example, I have another customer who actually has consistantly higher i/o stats, but far less system load numbers and good cpu stats ... the difference being the "good" site has less total number of processes running (same exact hardware).

2. The disc is hardware raid (5) setup on 5x36G 15000 rpm drives with an ibm 5i raid controller. I am running data=journal on the data filesystem, which I know is a lot more overhead but data security prevails.

3.  Yes, hyperthreading is on, I have not tried it off- but funny you mentioned I was just looking at some google threads regarding this...and wondering what would happen if I turned off...
0
 
LVL 9

Expert Comment

by:majorwoo
ID: 9905524
when i had my Xeon's and 1TB attached to them I discovered better performance under Linux with hyperthreading disabled. -- however I was only running 4-5 of my

Is it safe to assume that you get more instances of your app when you hit 0 idle and performance suffers?
0
 

Author Comment

by:medent
ID: 9906621
Others seem to indicate that hyperthreading generally helps under heavy process loads, but actually hinders under light loads....?

The peak times (2pm-3pm) in the afternoon seems not to be a peak in sessions, but a peak in use of those already existing sessions. (Dr's offices busiest time).

Ps-  The iowait seems to be broke in top, so its hard to tell if processes are waiting on i/o.  The load factor is just way too high during normal use. I assume if teh cpu is showing some idle with high a load number- than its not the cpu causing the load number to be high? I am of course assuming the load number of 50 and above is in outer space. (anything more than twice the cpu count?)
0
 
LVL 9

Expert Comment

by:majorwoo
ID: 9907063
I have heard similar things about hyperthreading, but I did keep hyperthreading disabled on the machine I used as our fileserver, overall response was quicker (if you look at the stats for you CPU's you will see that althouh each clains to have the processing power of a 2.0 xeon they can not perform at the level.)  I do however believe that given the number of processes you are running you will do better with hyper on - i think that hyper off may result in better speeds for the currently running process, but at the cost of overall lag to the system.

try pressing 1 inside of top to toggle multiprocessor mode, this will show you stats for each cpu (press w to save this config)
0
 

Author Comment

by:medent
ID: 9911400
I would like to identify the bottleneck using the stats I have now (the original question), a little more analysis before taking the risk of an experiment...  For example, I am  assuming if I have lots of cpu idle, but my load numbers are in orbit - then the bottle neck is elsewhere, probably disk... but not sure how to verify that. I think the load numbers are created based on certain process characteristics....?
0
Complete Microsoft Windows PC® & Mac Backup

Backup and recovery solutions to protect all your PCs & Mac– on-premises or in remote locations. Acronis backs up entire PC or Mac with patented reliable disk imaging technology and you will be able to restore workstations to a new, dissimilar hardware in minutes.

 
LVL 9

Expert Comment

by:majorwoo
ID: 9912592
I would assume the same thing -- have you performed any I/O calculations on the disk? As simple as timing a copy or used hdparm?

Also what version of redhat, and what kernel?
0
 

Author Comment

by:medent
ID: 9912640
Yes, I have my own i/o benchmark (copying+ zips)- and the server is compares ok to others of its class.
The rh base version is 7.3 with kernel 2.4.20-20.7 and glibc2.32
0
 
LVL 9

Expert Comment

by:majorwoo
ID: 9915534
have you updated the file utils package? I remeber our redhat 7.3 servers doing poorly until we upgraded the kernel, file utils, and a few other packages related to file handling.
0
 

Author Comment

by:medent
ID: 9919658
Yes,  fileutils was updated, plus other dependencies
fileutils-4.1.9-11.i386.rpm  procps-2.0.13-1.i386.rpm
libacl-2.0.11-2.i386.rpm     sh-utils-2.0.12-3.i386.rpm
libattr-2.0.8-3.i386.rpm
0
 
LVL 9

Expert Comment

by:majorwoo
ID: 9920225
with 12GB of ram you have certainly enabled highmem in the kernel correct?
0
 

Author Comment

by:medent
ID: 9920290
Yes up to 64G, and linux is seeing and using it.
0
 
LVL 9

Accepted Solution

by:
majorwoo earned 500 total points
ID: 9921000
the answer here may be that you simply have to many processes running on this machine, you said the site that works well has less total processes running, it seems that you are not waiting on I/O but actually on the CPU calcualtions, and if during the peak time even 1/2 of those 500 processes are active, 250 processes trying to get cpu time (even assuming 4 CPU's and 250/4 = 62.5) that i still 60 processes trying for each CPU, there is alot of context switching there.  
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

SSH (Secure Shell) - Tips and Tricks As you all know SSH(Secure Shell) is a network protocol, which we use to access/transfer files securely between two networked devices. SSH was actually designed as a replacement for insecure protocols that sen…
The purpose of this article is to demonstrate how we can use conditional statements using Python.
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

8 Experts available now in Live!

Get 1:1 Help Now