Hello,
Need your help. I'm new to Solaris administration. In our company, we use Sitescope to monitor the servers and network. Now it showing an alert on a Solaris server that its CPU utilization(6 CPUs) is 100%. Like this: "datamall CPU 100% avg, cpu1 100%, cpu2 100%, cpu3 100%, cpu4 100%, cpu5 100%, cpu6 100%".
To fix this, I get into the server and tried few commands I know to figure out the following:
1. How the CPU is utilized? Why it is overloaded?
2. Which process is taking the highest CPU usage?
3. What is the ideal load average a producation server should have?
4. Is there any zombie/hanging process which degards the CPU performance? Can I kill all those?
5. Is it okay if CPU running 100% like this or it may crash?
6. Finally the basic things: What is the speed of processor? Is there physically 6 CPU's or it a virtual?
Commands I tried are given below. Is there any better way to figure out the CPU utilization?
root@datamall:Thu # uptime
4:35pm up 258 day(s), 14:01, 19 users, load average: 17.72, 16.74, 16.47
root@datamall:Thu # psrinfo -v
Status of processor 0 as of: 09/21/06 16:36:29
Processor has been on-line since 01/06/06 01:34:34.
The sparc processor operates at 248 MHz,
and has a sparc floating point processor.
Status of processor 1 as of: 09/21/06 16:36:29
Processor has been on-line since 01/06/06 01:34:38.
The sparc processor operates at 248 MHz,
and has a sparc floating point processor.
Status of processor 4 as of: 09/21/06 16:36:29
Processor has been on-line since 01/06/06 01:34:38.
The sparc processor operates at 248 MHz,
and has a sparc floating point processor.
Status of processor 5 as of: 09/21/06 16:36:29
Processor has been on-line since 01/06/06 01:34:38.
The sparc processor operates at 248 MHz,
and has a sparc floating point processor.
Status of processor 8 as of: 09/21/06 16:36:29
Processor has been on-line since 01/06/06 01:34:38.
The sparc processor operates at 248 MHz,
and has a sparc floating point processor.
Status of processor 9 as of: 09/21/06 16:36:29
Processor has been on-line since 01/06/06 01:34:38.
The sparc processor operates at 248 MHz,
and has a sparc floating point processor.
root@datamall:Thu #
root@datamall:Thu # /usr/ucb/ps -aux | more
USER PID %CPU %MEM SZ RSS TT S START TIME COMMAND
ingres 6839 17.7 9.9495160375696 ? O Sep 07 16457:44 /usr/ingres6/ingre
ingres 6856 15.8 9.8465016373680 ? O Sep 07 16321:18 /usr/ingres6/ingre
oracle 5534 6.8 9.9401328377232 ? R 15:31:44 25:05 oraclecydw (LOCAL=
oracle 3601 6.7 9.9401208376592 ? R 15:23:14 28:28 oraclecydw (LOCAL=
oracle 7284 6.7 10.0402288377832 ? R 15:40:23 21:16 oraclecydw (LOCAL=
mis 22713 6.4 0.211488 7760 ? O Sep 13 2588:16 generic_ingresii +
bqserver 8529 5.1 1.87828066976 ? R 15:46:07 16:51 /RAID/usr10/bqs651
oracle 21187 4.9 9.6392024364968 ? R 09:00:11 204:07 oraclecydw (LOCAL=
oracle 15143 4.6 9.6391992364240 ? R Sep 17 3353:35 oraclecydw (LOCAL=
oracle 29834 3.9 9.6392032364224 ? O Sep 13 8368:36 oraclecydw (LOCAL=
oracle 21167 3.7 9.6392000364944 ? R 09:00:10 203:26 oraclecydw (LOCAL=
oracle 4008 3.5 0.123616 3320 ? S 23:55:05 143:11 tg4ingrechits (LOC
ingres 6873 2.4 0.1 5664 3440 ? R Sep 07 3513:12 /usr/ingres6/ingre
ingres 6869 2.1 0.1 5680 3368 ? R Sep 07 2985:21 /usr/ingres6/ingre
oracle 3997 1.0 9.7397656366552 ? S 23:55:01 48:46 oraclecydw (DESCRI
ingres 19033 1.0 0.266856 7640 ? R 16:32:57 0:01 lockstat
bqserver 1825 0.5 0.211696 5720 ? S 07:30:10 13:10 /RAID8/RAID/usr10/
mis 18365 0.5 0.1 2120 1760 ? S 16:30:02 0:02 java_client +debug
.
.
.
root@datamall:Thu # vmstat
procs memory page disk faults cpu
r b w swap free re mf pi po fr de sr m1 m1 m1 m2 in sy cs us sy id
1 1 1 264 384 1566 761 4199 1490 3577 0 3756 1 1 1 1 2904 3080 1450 469 535 749
root@datamall:Thu # iostat
tty md10 md11 md12 md20 cpu
tin tout kps tps serv kps tps serv kps tps serv kps tps serv us sy wt id
7 51 10 1 27 7 1 22 7 1 16 10 1 9 27 30 7 35
Awaiting your reply....
Thanks,
Ashok