asked on

Flat out server looks idle to me

I have a couple of multi-threaded processes supposedly going flat out processing data, but top seldom shows anything to indicate that the processes are CPU bound.

Here's a typical display from top:
--------8<--------
11:45:25 up 21 days, 23:09, 3 users, load average: 1.51, 1.65, 1.63
145 processes: 144 sleeping, 1 running, 0 zombie, 0 stopped
CPU0 states: 7.0% user 24.1% system 0.0% nice 0.0% iowait 68.2% idle
CPU1 states: 3.3% user 2.3% system 0.0% nice 1.3% iowait 91.4% idle
CPU2 states: 4.1% user 15.2% system 0.0% nice 1.4% iowait 78.1% idle
CPU3 states: 8.4% user 2.3% system 0.0% nice 20.4% iowait 67.1% idle
Mem: 8308812k av, 8006488k used, 302324k free, 0k shrd, 101100k buff
3672472k active, 4107008k inactive
Swap: 2097136k av, 0k used, 2097136k free 5340736k cached
--------8<--------

I guess that means they are I/O bound. What should I be looking at to confirm?

pjedmond

You need to give more information on the type of process concerned. Then start looking at 'bandwidths' - A serial cable used in a communication process will almost definitely be the limiting process.

I have a file server with a 100M ethernet cable - Copying files off the server to other servers will never get the CPU above about 20% (1.4GHz AMD Athlon).

In order to get more processing out of the server, I added a second ethernet card and gave it another ip in order to increase the bandwidth available (or I could have added a 1G card if the wiring was capable of coping with it)

( (()
(`-' _\
'' ''

rstaveley

ASKER

Thanks for the response, pjedmond. The LAN could be the issue.

One of the processes is doing a substantial amount if I/O over NFS. The other processes all do local disk I/O.

The processes essentially do a lot of data extraction and conversion from data files. There is a daemon written in C++ and a Java applications running as a daemon.

(1) The C++ daemon accesses NFS. There is fairly lightweight encryption (Blowfish) and heavy-weight compression (BZip2).

(2) The Java daemon doesn't touch NFS, but gets the C++ daemon to fetch data for it. The Java daemon does a lot of analysis on the data and ultimately indexes information from it. [It is Lucene application.]

The slowness is experienced in the inter-operation between one of the Java daemons and the C++ daemon. I haven't profiled it to find out which one is dragging its feet. I was hoping to get some sense from looking at system information available on the server.

There are two NICs in there There's no good reason why we shouldn't be using a Gigabit NIC.

Is there something in the /proc which would tell me what NICs I have?

Is there a test I can perform easily to confirm that the NIC is the bottle-neck?

rstaveley

ASKER

BTW... I see high numbers in nfsstat, but donlt really understand how to interpret them.

--------8<--------
root@gse-mta-10:~# nfsstat -c
Client rpc stats:
calls retrans authrefrsh
597186811 697404 0
Client nfs v2:
null getattr setattr root lookup readlink
0 0% 118888629 19% 70873 0% 0 0% 79509541 13% 0 0%
read wrcache write create remove rename
12252239 2% 0 0% 312135918 52% 25473775 4% 22108377 3% 0 0%
link symlink mkdir rmdir readdir fsstat
0 0% 0 0% 3238795 0% 3233045 0% 20275618 3% 1 0%
--------8<--------

ASKER CERTIFIED SOLUTION

pjedmond

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

rstaveley

ASKER

I see 100 Mbps from dmesg.

> Hopefully some of the above ramblings will be of use?

Yes they are. Especially:

> ...try creating more network traffic
> Does getting the C++ process to access a local NFS speed things up?

I'll set up some experiments.