sminfo
asked on
high load on one AIX 5.3 without CPU or IO used
Hi,
I have a strange high load (betwreen 2 and 5) constantly and I see no CPU or IO used on that server.
from topas:
CPU User% Kern% Wait% Idle% Physc Entc
ALL 0.1 0.3 0.0 99.6 0.01 0.6 Writes
Name PID CPU% PgSp Owner
topas 1757256 0.1 1.9 root
clstrmgr 331942 0.0 5.0 root
getty 241812 0.0 0.5 root
gil 69666 0.0 0.9 root
hats_dis 389356 0.0 1.8 root
hats_nim 507962 0.0 1.9 root
hatsd 704634 0.0 9.3 root
hats_nim 487456 0.0 1.9 root
hats_nim 450660 0.0 1.9 root
Disk Busy% KBPS TPS KB-Read KB-Writ PgspIn
dac0 0.0 2.0 4.0 1.0 1.0 PgspOut
hdisk2 0.0 1.0 2.0 0.5 0.5 PageIn
hdisk17 0.0 1.0 2.0 0.5 0.5 PageOut
dac1utm 0.0 0.0 0.0 0.0 0.0 Sios
dac0utm 0.0 0.0 0.0 0.0 0.0
EVENTS/QUEUES FILE/TTY
Cswitch 312 Readch 7346
Syscall 344 Writech 3152
19 Rawin 8
Writes 25 Ttyout 1059
Forks 0 Igets 0
Execs 0 Namei 24
Runqueue 1.5 Dirblk 0
Waitqueue 0.0
See attached cpu image.
/# vmstat 1 3
System configuration: lcpu=4 mem=6144MB ent=2.00
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------------------
r b avm fre re pi po fr sr cy in sy cs us sy id wa pc ec
2 0 515253 819301 0 0 0 0 0 0 43 459 313 0 0 99 0 0.02 1.1
2 0 515253 819301 0 0 0 0 0 0 28 254 319 0 0 99 0 0.01 0.6
2 0 515254 819300 0 0 0 0 0 0 14 149 242 0 0 99 0 0.01 0.5
# uptime
03:15PM up 99 days, 12:35, 5 users, load average: 2.70, 3.27, 3.08
Where can I look to if I found the cause of this issue?
Thanks
cpu.png
I have a strange high load (betwreen 2 and 5) constantly and I see no CPU or IO used on that server.
from topas:
CPU User% Kern% Wait% Idle% Physc Entc
ALL 0.1 0.3 0.0 99.6 0.01 0.6 Writes
Name PID CPU% PgSp Owner
topas 1757256 0.1 1.9 root
clstrmgr 331942 0.0 5.0 root
getty 241812 0.0 0.5 root
gil 69666 0.0 0.9 root
hats_dis 389356 0.0 1.8 root
hats_nim 507962 0.0 1.9 root
hatsd 704634 0.0 9.3 root
hats_nim 487456 0.0 1.9 root
hats_nim 450660 0.0 1.9 root
Disk Busy% KBPS TPS KB-Read KB-Writ PgspIn
dac0 0.0 2.0 4.0 1.0 1.0 PgspOut
hdisk2 0.0 1.0 2.0 0.5 0.5 PageIn
hdisk17 0.0 1.0 2.0 0.5 0.5 PageOut
dac1utm 0.0 0.0 0.0 0.0 0.0 Sios
dac0utm 0.0 0.0 0.0 0.0 0.0
EVENTS/QUEUES FILE/TTY
Cswitch 312 Readch 7346
Syscall 344 Writech 3152
19 Rawin 8
Writes 25 Ttyout 1059
Forks 0 Igets 0
Execs 0 Namei 24
Runqueue 1.5 Dirblk 0
Waitqueue 0.0
See attached cpu image.
/# vmstat 1 3
System configuration: lcpu=4 mem=6144MB ent=2.00
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------------------
r b avm fre re pi po fr sr cy in sy cs us sy id wa pc ec
2 0 515253 819301 0 0 0 0 0 0 43 459 313 0 0 99 0 0.02 1.1
2 0 515253 819301 0 0 0 0 0 0 28 254 319 0 0 99 0 0.01 0.6
2 0 515254 819300 0 0 0 0 0 0 14 149 242 0 0 99 0 0.01 0.5
# uptime
03:15PM up 99 days, 12:35, 5 users, load average: 2.70, 3.27, 3.08
Where can I look to if I found the cause of this issue?
Thanks
cpu.png
ASKER
See:
bsa550q2:/# netstat 1
input (en0) output input (Total) output
packets errs packets errs colls packets errs packets errs colls
1713251640 0 312324300 3 0 2856788509 0 1004924055 13 0
7 0 3 0 0 15 0 9 0 0
4 0 2 0 0 14 0 11 0 0
7 0 4 0 0 13 0 7 0 0
6 0 3 0 0 20 0 13 0 0
3 0 4 0 0 13 0 18 0 0
7 0 3 0 0 18 0 14 0 0
5 0 2 0 0 11 0 8 0 0
3 0 5 0 0 12 0 13 0 0
3 0 2 0 0 11 0 11 0 0
2 0 2 0 0 6 0 5 0 0
3 0 2 0 0 9 0 8 0 0
3 0 4 0 0 12 0 12 0 0
5 0 2 0 0 16 0 12 0 0
12 0 8 0 0 19 0 11 0 0
3 0 4 0 0 13 0 14 0 0
8 0 6 0 0 20 0 15 0 0
6 0 6 0 0 14 0 15 0 0
3 0 3 0 0 8 0 7 0 0
7 0 3 0 0 14 0 9 0 0
4 0 5 0 0 13 0 13 0 0
topas.bmp
bsa550q2:/# netstat 1
input (en0) output input (Total) output
packets errs packets errs colls packets errs packets errs colls
1713251640 0 312324300 3 0 2856788509 0 1004924055 13 0
7 0 3 0 0 15 0 9 0 0
4 0 2 0 0 14 0 11 0 0
7 0 4 0 0 13 0 7 0 0
6 0 3 0 0 20 0 13 0 0
3 0 4 0 0 13 0 18 0 0
7 0 3 0 0 18 0 14 0 0
5 0 2 0 0 11 0 8 0 0
3 0 5 0 0 12 0 13 0 0
3 0 2 0 0 11 0 11 0 0
2 0 2 0 0 6 0 5 0 0
3 0 2 0 0 9 0 8 0 0
3 0 4 0 0 12 0 12 0 0
5 0 2 0 0 16 0 12 0 0
12 0 8 0 0 19 0 11 0 0
3 0 4 0 0 13 0 14 0 0
8 0 6 0 0 20 0 15 0 0
6 0 6 0 0 14 0 15 0 0
3 0 3 0 0 8 0 7 0 0
7 0 3 0 0 14 0 9 0 0
4 0 5 0 0 13 0 13 0 0
topas.bmp
ASKER
... and there's not nfs on that server.. it's really really odd this issue..
regards
Israel.
regards
Israel.
swapping is happening
it causes highest priority IO
setting vmtune -M 1024 -m 1000
fixes it
longer lived fix is taming processes with WLM
AND buying extra memory
it causes highest priority IO
setting vmtune -M 1024 -m 1000
fixes it
longer lived fix is taming processes with WLM
AND buying extra memory
ASKER
wmp, sorry.. nfs is running on that server but it's not in use. look proctree:
# proctree
131244 /usr/sbin/srcmstr
94382 /usr/sbin/portmap
127206 /usr/sbin/snmpd
1830962 /usr/sbin/tftpd -n
200874 /usr/sbin/tftpd -n
217222 /usr/sbin/syslogd
233604 /usr/es/sbin/cluster/clcom d -d
655478 /usr/sbin/gsclvmd
270460 /usr/sbin/gsclvmd -r 30 -i 300 -t 300 -c 000030910000d6000000011819 b2b337 -v 0
307216 /usr/sbin/gsclvmd -r 30 -i 300 -t 300 -c 000030910000d6000000011819 b256e0 -v 0
397548 /usr/sbin/gsclvmd -r 30 -i 300 -t 300 -c 000030910000d6000000011819 b1fb14 -v 0
438494 /usr/sbin/gsclvmd -r 30 -i 300 -t 300 -c 000030910000d600000001181b dd4d5e -v 0
491538 /usr/sbin/gsclvmd -r 30 -i 300 -t 300 -c 000030910000d6000000011819 b30eb5 -v 0
524428 /usr/sbin/gsclvmd -r 30 -i 300 -t 300 -c 000030910000d6000000011819 b228fc -v 0
540782 /usr/sbin/gsclvmd -r 30 -i 300 -t 300 -c 000030910000d6000000011819 b33c6a -v 0
565468 /usr/sbin/gsclvmd -r 30 -i 300 -t 300 -c 000030910000d6000000011819 b19f89 -v 0
577748 /usr/sbin/gsclvmd -r 30 -i 300 -t 300 -c 000030910000d6000000011819 b171b2 -v 0
643282 /usr/sbin/gsclvmd -r 30 -i 300 -t 300 -c 000030910000d6000000011819 b1cd3d -v 0
700644 /usr/sbin/gsclvmd -r 30 -i 300 -t 300 -c 000030910000d6000000011819 b2e0eb -v 0
712866 /usr/sbin/gsclvmd -r 30 -i 300 -t 300 -c 000030910000d6000000011819 b142a7 -v 0
766146 /usr/sbin/gsclvmd -r 30 -i 300 -t 300 -c 000030910000d600000001181b dd7d54 -v 0
782558 /usr/sbin/gsclvmd -r 30 -i 300 -t 300 -c 000030910000d6000000011819 b28578 -v 0
295058 /usr/sbin/rsct/bin/rmcd -a IBM.LPCommands -r
299156 /usr/java5/bin/java -Xbootclasspath/a:/var/web sm/lwi/run time/core/ rcp/eclips e/p
303224 /usr/sbin/rpc.statd -d 0 -t 50
327870 /usr/sbin/rsct/bin/IBM.DRM d
331942 /usr/es/sbin/cluster/clstr mgr
544982 run_rcovcmd
335896 /usr/sbin/snmpmibd
704634 /usr/sbin/rsct/bin/hatsd -n 4 -o deadManSwitch
339970 /usr/sbin/rsct/bin/hats_ni m
389356 /usr/sbin/rsct/bin/hats_di skhb_nim
425984 /usr/sbin/rsct/bin/hats_di skhb_nim
450660 /usr/sbin/rsct/bin/hats_ni m
487456 /usr/sbin/rsct/bin/hats_ni m
507962 /usr/sbin/rsct/bin/hats_ni m
348332 /usr/sbin/rsct/bin/vac5/IB M.CSMAgent RMd
368866 /usr/sbin/rpc.lockd -d 0
376990 /usr/es/sbin/cluster/clinf o
413704 /usr/sbin/muxatmd
434220 hagsd grpsvcs
471134 /usr/sbin/nfsd 3891
499800 /usr/sbin/qdaemon
503884 /usr/sbin/biod 6
528568 /usr/sbin/rsct/bin/IBM.Hos tRMd
548942 /usr/sbin/xntpd
626866 /usr/sbin/writesrv
630870 /usr/sbin/rpc.mountd
634996 haemd HACMP 4 xxcccc_cluster SECNOSUPPORT
639210 /usr/sbin/aixmibd
688292 harmad -t HACMP -n xxcccc_cluster
770228 /usr/sbin/hostmibd
794768 sendmail: accepting connections nnections
893026 /usr/sbin/inetd
1228886 rlogind rlogind
663642 -ksh
1609826 proctree
1786052 telnetd telnetd -a
884840 -ksh
1011938 rlogind rlogind
1712352 -ksh
1347782 telnetd telnetd -a
938142 -ksh
622756 -sh
1261722 /usr/sbin/sshd a
98456 /usr/sbin/cron
106710 AtapeManager
114892 /usr/sbin/syncd 60
123132 random
147532 aioserver
163996 /usr/lib/errdemon
172116 /usr/dt/bin/dtlogin -daemon
184410 /usr/ccs/bin/shlap64
204992 /usr/bin/xmwlm -T -s 300 -R 1 -r 6 -o /etc/perf/daily/ -ypersistent=1 -ystart_t
213120 /usr/sbin/uprintfd
237610 aioserver
241812 /usr/sbin/getty /dev/console
249992 /opt/IBM_DS4000/jre/bin/ja va -Djava.compiler=NONE -Ddevmgr.datadir=/var/opt/ SM
262286 /usr/opt/db2_08_01/bin/db2 fmcd
278672 /opt/IBM_DS4000/jre/bin/ja va -Djava.compiler=NONE -Djava.library.path=/usr/S Mag
286908 xmtopas -p3
311376 /usr/tivoli/tsm/server/bin /dsmserv quiet
442412
319672 auditbin
356588 /home/db2as/das/adm/db2das rrm
372918 /home/db2as/das/bin/db2fmd -i db2as -m /home/db2as/das/lib/libdb2 dasgcf.a
381006 aioserver
385248 aioserver
405574 /bin/bsh /usr/lib/sa/sa1 300 12
1089742 /usr/lib/sa/sadc 300 12 /var/adm/sa/sa13
417992 aioserver
462976 aioserver
520290 aioserver
536678 rpc.lockd
569558 aioserver
589832 nfsd
659546 aioserver
827580 aioserver
864432 aioserver
909422 aioserver
925916 aioserver
958694 aioserver
1007866 aioserver
1016050 aioserver
1024244 aioserver
1028342 aioserver
1032440 aioserver
1048630 aioserver
1179808 aioserver
1274040 aioserver
1310796 aioserver
1323236 aioserver
1356016 aioserver
1380552 aioserver
1454288 aioserver
1458388 aioserver
1462482 aioserver
1466582 aioserver
1470680 aioserver
1474786 aioserver
1478872 aioserver
1482978 aioserver
1487076 aioserver
1490970 aioserver
1499258 aioserver
1523822 aioserver
1536020 aioserver
1548480 aioserver
I also run entstat -d to all ethernet interfaces but don't see errors
NOTE: This is the next server to be hardening, so don't scold me. :-) <-- Dont know if it's this word, I've just translate.
Any other command to run?
Thanks in deed
# proctree
131244 /usr/sbin/srcmstr
94382 /usr/sbin/portmap
127206 /usr/sbin/snmpd
1830962 /usr/sbin/tftpd -n
200874 /usr/sbin/tftpd -n
217222 /usr/sbin/syslogd
233604 /usr/es/sbin/cluster/clcom
655478 /usr/sbin/gsclvmd
270460 /usr/sbin/gsclvmd -r 30 -i 300 -t 300 -c 000030910000d6000000011819
307216 /usr/sbin/gsclvmd -r 30 -i 300 -t 300 -c 000030910000d6000000011819
397548 /usr/sbin/gsclvmd -r 30 -i 300 -t 300 -c 000030910000d6000000011819
438494 /usr/sbin/gsclvmd -r 30 -i 300 -t 300 -c 000030910000d600000001181b
491538 /usr/sbin/gsclvmd -r 30 -i 300 -t 300 -c 000030910000d6000000011819
524428 /usr/sbin/gsclvmd -r 30 -i 300 -t 300 -c 000030910000d6000000011819
540782 /usr/sbin/gsclvmd -r 30 -i 300 -t 300 -c 000030910000d6000000011819
565468 /usr/sbin/gsclvmd -r 30 -i 300 -t 300 -c 000030910000d6000000011819
577748 /usr/sbin/gsclvmd -r 30 -i 300 -t 300 -c 000030910000d6000000011819
643282 /usr/sbin/gsclvmd -r 30 -i 300 -t 300 -c 000030910000d6000000011819
700644 /usr/sbin/gsclvmd -r 30 -i 300 -t 300 -c 000030910000d6000000011819
712866 /usr/sbin/gsclvmd -r 30 -i 300 -t 300 -c 000030910000d6000000011819
766146 /usr/sbin/gsclvmd -r 30 -i 300 -t 300 -c 000030910000d600000001181b
782558 /usr/sbin/gsclvmd -r 30 -i 300 -t 300 -c 000030910000d6000000011819
295058 /usr/sbin/rsct/bin/rmcd -a IBM.LPCommands -r
299156 /usr/java5/bin/java -Xbootclasspath/a:/var/web
303224 /usr/sbin/rpc.statd -d 0 -t 50
327870 /usr/sbin/rsct/bin/IBM.DRM
331942 /usr/es/sbin/cluster/clstr
544982 run_rcovcmd
335896 /usr/sbin/snmpmibd
704634 /usr/sbin/rsct/bin/hatsd -n 4 -o deadManSwitch
339970 /usr/sbin/rsct/bin/hats_ni
389356 /usr/sbin/rsct/bin/hats_di
425984 /usr/sbin/rsct/bin/hats_di
450660 /usr/sbin/rsct/bin/hats_ni
487456 /usr/sbin/rsct/bin/hats_ni
507962 /usr/sbin/rsct/bin/hats_ni
348332 /usr/sbin/rsct/bin/vac5/IB
368866 /usr/sbin/rpc.lockd -d 0
376990 /usr/es/sbin/cluster/clinf
413704 /usr/sbin/muxatmd
434220 hagsd grpsvcs
471134 /usr/sbin/nfsd 3891
499800 /usr/sbin/qdaemon
503884 /usr/sbin/biod 6
528568 /usr/sbin/rsct/bin/IBM.Hos
548942 /usr/sbin/xntpd
626866 /usr/sbin/writesrv
630870 /usr/sbin/rpc.mountd
634996 haemd HACMP 4 xxcccc_cluster SECNOSUPPORT
639210 /usr/sbin/aixmibd
688292 harmad -t HACMP -n xxcccc_cluster
770228 /usr/sbin/hostmibd
794768 sendmail: accepting connections nnections
893026 /usr/sbin/inetd
1228886 rlogind rlogind
663642 -ksh
1609826 proctree
1786052 telnetd telnetd -a
884840 -ksh
1011938 rlogind rlogind
1712352 -ksh
1347782 telnetd telnetd -a
938142 -ksh
622756 -sh
1261722 /usr/sbin/sshd a
98456 /usr/sbin/cron
106710 AtapeManager
114892 /usr/sbin/syncd 60
123132 random
147532 aioserver
163996 /usr/lib/errdemon
172116 /usr/dt/bin/dtlogin -daemon
184410 /usr/ccs/bin/shlap64
204992 /usr/bin/xmwlm -T -s 300 -R 1 -r 6 -o /etc/perf/daily/ -ypersistent=1 -ystart_t
213120 /usr/sbin/uprintfd
237610 aioserver
241812 /usr/sbin/getty /dev/console
249992 /opt/IBM_DS4000/jre/bin/ja
262286 /usr/opt/db2_08_01/bin/db2
278672 /opt/IBM_DS4000/jre/bin/ja
286908 xmtopas -p3
311376 /usr/tivoli/tsm/server/bin
442412
319672 auditbin
356588 /home/db2as/das/adm/db2das
372918 /home/db2as/das/bin/db2fmd
381006 aioserver
385248 aioserver
405574 /bin/bsh /usr/lib/sa/sa1 300 12
1089742 /usr/lib/sa/sadc 300 12 /var/adm/sa/sa13
417992 aioserver
462976 aioserver
520290 aioserver
536678 rpc.lockd
569558 aioserver
589832 nfsd
659546 aioserver
827580 aioserver
864432 aioserver
909422 aioserver
925916 aioserver
958694 aioserver
1007866 aioserver
1016050 aioserver
1024244 aioserver
1028342 aioserver
1032440 aioserver
1048630 aioserver
1179808 aioserver
1274040 aioserver
1310796 aioserver
1323236 aioserver
1356016 aioserver
1380552 aioserver
1454288 aioserver
1458388 aioserver
1462482 aioserver
1466582 aioserver
1470680 aioserver
1474786 aioserver
1478872 aioserver
1482978 aioserver
1487076 aioserver
1490970 aioserver
1499258 aioserver
1523822 aioserver
1536020 aioserver
1548480 aioserver
I also run entstat -d to all ethernet interfaces but don't see errors
NOTE: This is the next server to be hardening, so don't scold me. :-) <-- Dont know if it's this word, I've just translate.
Any other command to run?
Thanks in deed
There is no swapping at all. pi/po is zero, PgspIn/PgspOut as well.
There must be a process waking up very often, but doing very little work.
Maybe you should activate PROC_Create for root in audit/config (for a short time, of course) to see what's going on.
There must be a process waking up very often, but doing very little work.
Maybe you should activate PROC_Create for root in audit/config (for a short time, of course) to see what's going on.
ASKER
Hi hgeist
I think there's not swap use:
monitor@: /home/monitor # lsps -a
Page Space Physical Volume Volume Group Size %Used Active Auto Type
paging00 hdisk0 rootvg 4096MB 1 yes yes lv
hd6 hdisk0 rootvg 4096MB 1 yes yes lv
I think there's not swap use:
monitor@: /home/monitor # lsps -a
Page Space Physical Volume Volume Group Size %Used Active Auto Type
paging00 hdisk0 rootvg 4096MB 1 yes yes lv
hd6 hdisk0 rootvg 4096MB 1 yes yes lv
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
As for swapping: See my comment http:#a33430313 above.
And further this one:
544982 run_rcovcmd
There is something going on with your cluster. Please check hacmp.out.
Some failover or verify/sync action is not complete, and the cluster is complaining about it.
544982 run_rcovcmd
There is something going on with your cluster. Please check hacmp.out.
Some failover or verify/sync action is not complete, and the cluster is complaining about it.
ASKER
the message on the cluster is:
Aug 13 00:00:16 local0:info /usr/es/sbin/cluster/godmd [1757392]: Failed operation(1) return status 9.
Aug 13 00:00:16 local0:info /usr/es/sbin/cluster/godmd [1089668]: Failed operation(1) return status 9.
Aug 13 00:00:17 local0:info /usr/es/sbin/cluster/godmd [1228940]: Failed operation(1) return status 9.
Doc says it should be ignore...
So it seems the 14 VG is causing the high load, no?
Aug 13 00:00:16 local0:info /usr/es/sbin/cluster/godmd
Aug 13 00:00:16 local0:info /usr/es/sbin/cluster/godmd
Aug 13 00:00:17 local0:info /usr/es/sbin/cluster/godmd
Doc says it should be ignore...
So it seems the 14 VG is causing the high load, no?
ASKER
btw, what's run_rcovcmd? it does not have a man page. I'm almost leaving, but I tried to connect from home.
thanks to both of you.
Israel.
thanks to both of you.
Israel.
Yep,
that message is from nightly auto-verification and can be ignored.
And, sorry, I overlooked that you're obviously running HACMP 5.4.1 or later.
run_rcovcmd is not necessarily related to an error from these releases on.
I never saw a cluster containing 14 concurrent VGs.
I think it might very well be that this poor "Group Services Concurrent Logical Volume Management Daemon" (gsclvmd) could cause this high load, the more so because you're probably running more than one or two LVs per VG, am I right?
that message is from nightly auto-verification and can be ignored.
And, sorry, I overlooked that you're obviously running HACMP 5.4.1 or later.
run_rcovcmd is not necessarily related to an error from these releases on.
I never saw a cluster containing 14 concurrent VGs.
I think it might very well be that this poor "Group Services Concurrent Logical Volume Management Daemon" (gsclvmd) could cause this high load, the more so because you're probably running more than one or two LVs per VG, am I right?
run_rcovcmd:
In earlier releases it was used to control the duration of any event (and thus started along with the corresponding event) and complain if this duration was considered "too long".
In the newer releases it seems to run permanently, for what reasons ever.
In earlier releases it was used to control the duration of any event (and thus started along with the corresponding event) and complain if this duration was considered "too long".
In the newer releases it seems to run permanently, for what reasons ever.
ASKER
well, they say 14 VGs is not a high load to the cluster. ANd yes, it has a lot os LV inside VGs.
Have a nice weekend wmp.
Load average of 2 is not high
I have seen like 50 on normally working 10-CPU 32bit system
I have seen like 50 on normally working 10-CPU 32bit system
Page Space Physical Volume Volume Group Size %Used Active Auto Type
paging00 hdisk0 rootvg 4096MB 1 yes yes lv
hd6 hdisk0 rootvg 4096MB 1 yes yes lv
delete paging00 and extend hd6
having two paging areas on same drive is very bad for performance.
paging00 hdisk0 rootvg 4096MB 1 yes yes lv
hd6 hdisk0 rootvg 4096MB 1 yes yes lv
delete paging00 and extend hd6
having two paging areas on same drive is very bad for performance.
you might need to tune aioservers (smitty aio)
probably aio request queue gets full and (oracle) process is waiting, when aio should offload this waiting to kernel.
probably aio request queue gets full and (oracle) process is waiting, when aio should offload this waiting to kernel.
now that I'm back from France let's look at this one.
The only thing which could catch one's eye is the gil process.
It's a kernel process ("Generalized Interrupt Level") which deals with TCP network acknowledgements and, more important, retransmissions.
So please examine yor network traffic, e.g. the "errs" column of netstat 1 (meaning 1 second interval),
or the "packets" column for high values.
Or check "topas" (left middle).
Is this perhaps an NFS/Smaba server with a lot of network traffic?
wmp