Flat out server looks idle to me

I have a couple of multi-threaded processes supposedly going flat out processing data, but top seldom shows anything to indicate that the processes are CPU bound.

Here's a typical display from top:
11:45:25  up 21 days, 23:09,  3 users,  load average: 1.51, 1.65, 1.63
145 processes: 144 sleeping, 1 running, 0 zombie, 0 stopped
CPU0 states:   7.0% user  24.1% system    0.0% nice   0.0% iowait  68.2% idle
CPU1 states:   3.3% user   2.3% system    0.0% nice   1.3% iowait  91.4% idle
CPU2 states:   4.1% user  15.2% system    0.0% nice   1.4% iowait  78.1% idle
CPU3 states:   8.4% user   2.3% system    0.0% nice  20.4% iowait  67.1% idle
Mem:  8308812k av, 8006488k used,  302324k free,       0k shrd,  101100k buff
      3672472k active,            4107008k inactive
Swap: 2097136k av,       0k used, 2097136k free                 5340736k cached

I guess that means they are I/O bound. What should I be looking at to confirm?
LVL 17
Who is Participating?
pjedmondConnect With a Mentor Commented:

will tell you that you have NICs and that they are running and what they are configured as.

dmesg | grep eth

should give you the names of your drivers.

To test if the NIC is the bottleneck, try creating more network traffic - does performance get worse? (Not that processing may increase in order to cope with more timeouts). Alternatively, create start a huge scp process from that server to another - again does performance increase...and again note that processing may increase.

Look at the 'nice' settings for the processors concerned. Perhaps you could give them a higher priority?

man nice

for more details.

Does getting the C++ process to access a local NFS speed things up? - Again eliminating (some of) the network potential bottlenecks?

Hopefully some of the above ramblings will be of use?

(   (()
(`-' _\
 ''  ''
You need to give more information on the type of process concerned. Then start looking at 'bandwidths' - A serial cable used in a communication process will almost definitely be the limiting process.

I have a file server with a 100M ethernet cable - Copying files off the server to other servers will never get the CPU above about 20% (1.4GHz AMD Athlon).

In order to get more processing out of the server, I added a second ethernet card and gave it another ip in order to increase the bandwidth available (or I could have added a 1G card if the wiring was capable of coping with it)

(   (()
(`-' _\
 ''  ''
rstaveleyAuthor Commented:
Thanks for the response, pjedmond. The LAN could be the issue.

One of the processes is doing a substantial amount if I/O over NFS. The other processes all do local disk I/O.

The processes essentially do a lot of data extraction and conversion from data files. There is a daemon written in C++ and a Java applications running as a daemon.

(1) The C++ daemon accesses NFS. There is fairly lightweight encryption (Blowfish) and heavy-weight compression (BZip2).

(2) The Java daemon doesn't touch NFS, but gets the C++ daemon to fetch data for it. The Java daemon does a lot of analysis on the data and ultimately indexes information from it. [It is  Lucene application.]

The slowness is experienced in the inter-operation between one of the Java daemons and the C++ daemon. I haven't profiled it to find out which one is dragging its feet. I was hoping to get some sense from looking at system information available on the server.

There are two NICs in there There's no good reason why we shouldn't be using a Gigabit NIC.

Is there something in the /proc which would tell me what NICs I have?

Is there a test I can perform easily to confirm that the NIC is the bottle-neck?
rstaveleyAuthor Commented:
BTW... I see high numbers in nfsstat, but donlt really understand how to interpret them.

root@gse-mta-10:~# nfsstat -c
Client rpc stats:
calls      retrans    authrefrsh
597186811   697404     0      
Client nfs v2:
null       getattr    setattr    root       lookup     readlink  
0       0% 118888629 19% 70873   0% 0       0% 79509541 13% 0       0%
read       wrcache    write      create     remove     rename    
12252239  2% 0       0% 312135918 52% 25473775  4% 22108377  3% 0       0%
link       symlink    mkdir      rmdir      readdir    fsstat    
0       0% 0       0% 3238795  0% 3233045  0% 20275618  3% 1       0%
rstaveleyAuthor Commented:
I see 100 Mbps from dmesg.

> Hopefully some of the above ramblings will be of use?

Yes they are. Especially:

> ...try creating more network traffic
> Does getting the C++ process to access a local NFS speed things up?

I'll set up some experiments.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.