rstaveley
asked on
Flat out server looks idle to me
I have a couple of multi-threaded processes supposedly going flat out processing data, but top seldom shows anything to indicate that the processes are CPU bound.
Here's a typical display from top:
--------8<--------
11:45:25 up 21 days, 23:09, 3 users, load average: 1.51, 1.65, 1.63
145 processes: 144 sleeping, 1 running, 0 zombie, 0 stopped
CPU0 states: 7.0% user 24.1% system 0.0% nice 0.0% iowait 68.2% idle
CPU1 states: 3.3% user 2.3% system 0.0% nice 1.3% iowait 91.4% idle
CPU2 states: 4.1% user 15.2% system 0.0% nice 1.4% iowait 78.1% idle
CPU3 states: 8.4% user 2.3% system 0.0% nice 20.4% iowait 67.1% idle
Mem: 8308812k av, 8006488k used, 302324k free, 0k shrd, 101100k buff
3672472k active, 4107008k inactive
Swap: 2097136k av, 0k used, 2097136k free 5340736k cached
--------8<--------
I guess that means they are I/O bound. What should I be looking at to confirm?
Here's a typical display from top:
--------8<--------
11:45:25 up 21 days, 23:09, 3 users, load average: 1.51, 1.65, 1.63
145 processes: 144 sleeping, 1 running, 0 zombie, 0 stopped
CPU0 states: 7.0% user 24.1% system 0.0% nice 0.0% iowait 68.2% idle
CPU1 states: 3.3% user 2.3% system 0.0% nice 1.3% iowait 91.4% idle
CPU2 states: 4.1% user 15.2% system 0.0% nice 1.4% iowait 78.1% idle
CPU3 states: 8.4% user 2.3% system 0.0% nice 20.4% iowait 67.1% idle
Mem: 8308812k av, 8006488k used, 302324k free, 0k shrd, 101100k buff
3672472k active, 4107008k inactive
Swap: 2097136k av, 0k used, 2097136k free 5340736k cached
--------8<--------
I guess that means they are I/O bound. What should I be looking at to confirm?
ASKER
Thanks for the response, pjedmond. The LAN could be the issue.
One of the processes is doing a substantial amount if I/O over NFS. The other processes all do local disk I/O.
The processes essentially do a lot of data extraction and conversion from data files. There is a daemon written in C++ and a Java applications running as a daemon.
(1) The C++ daemon accesses NFS. There is fairly lightweight encryption (Blowfish) and heavy-weight compression (BZip2).
(2) The Java daemon doesn't touch NFS, but gets the C++ daemon to fetch data for it. The Java daemon does a lot of analysis on the data and ultimately indexes information from it. [It is Lucene application.]
The slowness is experienced in the inter-operation between one of the Java daemons and the C++ daemon. I haven't profiled it to find out which one is dragging its feet. I was hoping to get some sense from looking at system information available on the server.
There are two NICs in there There's no good reason why we shouldn't be using a Gigabit NIC.
Is there something in the /proc which would tell me what NICs I have?
Is there a test I can perform easily to confirm that the NIC is the bottle-neck?
One of the processes is doing a substantial amount if I/O over NFS. The other processes all do local disk I/O.
The processes essentially do a lot of data extraction and conversion from data files. There is a daemon written in C++ and a Java applications running as a daemon.
(1) The C++ daemon accesses NFS. There is fairly lightweight encryption (Blowfish) and heavy-weight compression (BZip2).
(2) The Java daemon doesn't touch NFS, but gets the C++ daemon to fetch data for it. The Java daemon does a lot of analysis on the data and ultimately indexes information from it. [It is Lucene application.]
The slowness is experienced in the inter-operation between one of the Java daemons and the C++ daemon. I haven't profiled it to find out which one is dragging its feet. I was hoping to get some sense from looking at system information available on the server.
There are two NICs in there There's no good reason why we shouldn't be using a Gigabit NIC.
Is there something in the /proc which would tell me what NICs I have?
Is there a test I can perform easily to confirm that the NIC is the bottle-neck?
ASKER
BTW... I see high numbers in nfsstat, but donlt really understand how to interpret them.
--------8<--------
root@gse-mta-10:~# nfsstat -c
Client rpc stats:
calls retrans authrefrsh
597186811 697404 0
Client nfs v2:
null getattr setattr root lookup readlink
0 0% 118888629 19% 70873 0% 0 0% 79509541 13% 0 0%
read wrcache write create remove rename
12252239 2% 0 0% 312135918 52% 25473775 4% 22108377 3% 0 0%
link symlink mkdir rmdir readdir fsstat
0 0% 0 0% 3238795 0% 3233045 0% 20275618 3% 1 0%
--------8<--------
--------8<--------
root@gse-mta-10:~# nfsstat -c
Client rpc stats:
calls retrans authrefrsh
597186811 697404 0
Client nfs v2:
null getattr setattr root lookup readlink
0 0% 118888629 19% 70873 0% 0 0% 79509541 13% 0 0%
read wrcache write create remove rename
12252239 2% 0 0% 312135918 52% 25473775 4% 22108377 3% 0 0%
link symlink mkdir rmdir readdir fsstat
0 0% 0 0% 3238795 0% 3233045 0% 20275618 3% 1 0%
--------8<--------
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
I see 100 Mbps from dmesg.
> Hopefully some of the above ramblings will be of use?
Yes they are. Especially:
> ...try creating more network traffic
> Does getting the C++ process to access a local NFS speed things up?
I'll set up some experiments.
> Hopefully some of the above ramblings will be of use?
Yes they are. Especially:
> ...try creating more network traffic
> Does getting the C++ process to access a local NFS speed things up?
I'll set up some experiments.
I have a file server with a 100M ethernet cable - Copying files off the server to other servers will never get the CPU above about 20% (1.4GHz AMD Athlon).
In order to get more processing out of the server, I added a second ethernet card and gave it another ip in order to increase the bandwidth available (or I could have added a 1G card if the wiring was capable of coping with it)
( (()
(`-' _\
'' ''