Solved

what's consuming Kern% on topas?

Posted on 2013-12-17
7
2,183 Views
Last Modified: 2013-12-17
Hi,

After a migration, on October 14,  to AIX 6.1 CPU use increased and we dont know what process are causing this. On topas, we see a high use of Kern% if compare this LPAR with the rest as you can see below:
CPU  User%  [b]Kern% [/b] Wait%  Idle%  Physc   Entc   Reads     16462  Rawin         0
ALL     41.1        [b]57.5 [/b]     0.7        0.7   2.93        146.5   Writes     4834  Ttyout     3014

Open in new window


How can I see what process are using this Kern%?

nmon kernel shows:
¦ Kernel ----------------------------------------------------------------------------------------------------------------------------------------------------¦
¦RunQueue=      1.5 | swapIn =      0.0 | Directory Search | Kernel Processes                                                                                ¦
¦pswitch =   8170.1 | syscall= 287045.4 | iget  =      0.0 | ksched=      0.0                                                                                ¦
¦fork    =     81.0 | read   = 158813.8 | dirblk=      0.0 | koverf=      0.0                                                                                ¦
¦exec    =     73.0 | write  =   2776.3 | namei =  21855.8 | kexit =      0.0                                                                                ¦
¦msg     =    112.0 | readch =    121692012.9              | Load Averages                                                                                   ¦
¦sem     =   3824.8 | writech=      1132269.3              | 1 min =      5.86                                                                               ¦
¦HW Intrp=    500.5 | R+W(MB/s)=         14.6              | 5 min =      5.53                                                                               ¦
¦SW Intrp=   3546.3 | Up Time=19.6 days (max=497)          | 15 min=      6.05                                                                               ¦
¦Processes asleep - waiting for:                                                                                                                             ¦
¦                                                                                                                                                            ¦
¦------------------------------------------------------------------------------------------------------------------------------------------------------------¦

Open in new window


+-topas_nmon--N=NFS--------------Host=bibmprod-------Refresh=2 secs---12:42.51-------------------------------------------------------------------------------+
¦ CPU-Utilisation-Small-View -----------EntitledCPU=  2.00 UsedCPU=  1.843-----------------------------------------------------------------------------------¦
¦Logical  CPUs              0----------25-----------50----------75----------100                                                                              ¦
¦CPU User%  Sys% Wait% Idle%|           |            |           |            |                                                                              ¦
¦  0  28.0  40.0  17.5  14.5|UUUUUUUUUUUUUUssssssssssssssssssssWWWWWWWW>      |                                                                              ¦
¦  1  30.0  48.0  21.5   0.5|UUUUUUUUUUUUUUUssssssssssssssssssssssssWWWWWWWWWW>                                                                              ¦
¦  2  11.5  15.0  22.0  51.5|UUUUUsssssssWWWWWWWWWWW                  >       |                                                                              ¦
¦  3   5.9  14.4   9.9  69.8|UUsssssssWWWW                            >       |                                                                              ¦
¦EntitleCapacity/VirtualCPU +-----------|------------|-----------|------------+                                                                              ¦
¦ EC  42.4  48.0   0.8   1.0|UUUUUUUUUUUUUUUUUUUUUssssssssssssssssssssssss----|                                                                              ¦
¦ VP  21.2  24.0   0.4   0.5|UUUUUUUUUUssssssssssss---------------------------|                                                                              ¦
¦EC=  92.1%  VP=  46.1%     +--No Cap---|------------|-SMT=1-----100% VP=4 CPU+                                                                              ¦
¦------------------------------------------------------------------------------------------------------------------------------------------------------------¦
¦                                                                                                                                                            ¦

Open in new window


The CPU's increase can be shown on the image attached (lpar2rrd).

We'd like to know some experiences to find this kinds of bottleneck on CPU.

Thanks.
cpu.JPG
0
Comment
Question by:sminfo
  • 4
  • 3
7 Comments
 
LVL 68

Accepted Solution

by:
woolmilkporc earned 500 total points
ID: 39723919
Hi again,

at first sight I assume that the high kernel activity is due to the many pswitches. Such a context switch is done when a process is waiting for an I/O to complete and control is given to another process - a procedure handled by the kernel.

Despite the fact that the underlying cause for the switches is an I/O wait the "Wait%" value doesn't increase - there is always a process to be scheduled, so that only at a few occasions the whole system has to wait.

Further, it seems that this machine is doing quite a lot of filesystem access, and that there are many different (high "namei" value!) files to be opened and read per time unit. The average number of bytes read per call is just ~ 800, so I assume that the affected files are rather small ones.
Dos "topas" show high PageIn/PageOut?

Please run "topas -P" and sort by the PGFAULTS I/O column. The top processes are the ones with high filesystem I/O.

All this looks as if there is not a sufficient amount of filesystem buffer cache available.

Could it be that you changed the values for maxperm%, minperm%, maxclient%? Or perhaps you changed these values in your old system and you're now running with the defaults?

Or is there high paging activity? Has anything changed in the memory configuration? Less memory? Active Memory Sharing implemented? Or Active Memory Expansion?
0
 

Author Comment

by:sminfo
ID: 39723963
Hi Wmp,

topas -P output (but I couldn't find any PGIN column

Topas Monitor for host:    prod       Interval:   2    Tue Dec 17 15:10:53 2013

                                DATA  TEXT  PAGE               PGFAULTS
USER        PID    PPID PRI NI   RES   RES SPACE    TIME CPU%  I/O  OTH COMMAND
db2p    13238404 12910708 102 20 89083    32 89099 8838:31 22.5    3    5 db2sysc    sc
cics    44499178 11403428  60 20  7477    14  7477   13:38  5.3    0  323 cicsas
cics    20775164 11403428 102 20  6414    14  6414    2:18  4.4    0  247 cicsas
root    14549030       1   1 41 52168   806 52168  813:19  2.0    0    0 seosd
emuser  20906122 19595508 144 20  1632   288  1632  890:55  1.7    0    0 ecs.cmsg
cics    55574642 11403428 102 20  7025    14  7025    6:47  1.4    0  130 cicsas
cics    54001892 46661828  60 20  5350    14  5350    1:30  1.4    0  103 cicsas
cics    57213088 11403428 102 20  6306    14  6306    1:55  1.2    0  117 cicsas
cics    41484488 1572986 102 20 59337   515 59337  711:39  0.7    0    0 sfs
root    31916204 4260002  60 20   122   553   203    0:00  0.5    0  547 sshd
root    48562178 3473520  60 20   191    25   191    0:00  0.4    0  312 telnetd
root    24969412 48562178  60 20   174    21   174    0:00  0.4    0  292 tsm
root    8978456       1  60 20   736     0   736 1592:47  0.3    0    0 rpvc_kpr
emuser  8912896 20513012  60 20  3662   146  3662   45:25  0.2    0    0 ecs.main
root    23003350 59768974  60 20  1001   556  1001    0:00  0.2    0    0 topas
root    14745682 1572986  39 20  1546  1838  1546  119:04  0.1    0    0 clstrmgr
cics    54067454 46661828 102 20  5221    14  5221    0:34  0.1    0    6 cicsas
emuser  21889258 19071224 144 20 16519   228 16519   75:22  0.1    0    0 ecs.guis
emuser  49479918       1 144 20  1482  7990  1482    0:00  0.1    0    0 oracle
root    1310760       0  16 41   128     0   128   53:28  0.1    0    0 wlmsched
root    4653206       1  60 20  8263    30  8263   39:27  0.1    0    0 java
cics    16515212 52953200 102 20  2151    10  2151    0:15  0.1    0    0 cicsip
root    1245222       0  37 41   240     0   240   22:22  0.1    0    0 gil
emuser  12320938 10420366 144 20 10470    37 10470   16:29  0.1    0    0 ecs.bims
emuser  7077972 14024854 144 20 36819   197 36819   24:08  0.0    0    0 ecs.gasr
root    11731182 1572986  60 20  3709   317  3709   21:29  0.0    0    0 clinfo
monitor 33095704 43057158 144 20   158    72   158    0:00  0.0    0   68 ksh
root    12648462       1  60 20  1265    67  1265   51:22  0.0    0    0 sarpcd
emuser  16384184 23265332 144 20  4514   342  4514   35:13  0.0    0    0 ecs.cms
mqm     13041720 13959364 123 20  2755     2  2755    5:51  0.0    0    0 amqrmppa
root    25559154       1  60 20  1328     0  1328   17:28  0.0    0    0 nfsd

Open in new window


Well. I dont believe I changed maxperm%, minperm%, minclient% parameters. vmo -l shows:

(prod):[root] / -> vmo -a
             ame_cpus_per_pool = n/a
               ame_maxfree_mem = n/a
           ame_min_ucpool_size = n/a
               ame_minfree_mem = n/a
               ams_loan_policy = n/a
  enhanced_affinity_affin_time = 1
enhanced_affinity_vmpool_limit = 10
                esid_allocator = 0
           force_relalias_lite = 0
             kernel_heap_psize = 65536
                  lgpg_regions = 0
                     lgpg_size = 0
               low_ps_handling = 1
                       maxfree = 1088
                       maxperm = 10984708
                        maxpin = 10140052
                       maxpin% = 80
                 memory_frames = 12582912
                 memplace_data = 0
          memplace_mapped_file = 0
        memplace_shm_anonymous = 0
            memplace_shm_named = 0
                memplace_stack = 0
                 memplace_text = 0
        memplace_unmapped_file = 0
                       minfree = 960
                       minperm = 366156
                      minperm% = 3
                     nokilluid = 0
                       npskill = 17920
                       npswarn = 71680
           num_locks_per_semid = 1
                     numpsblks = 2293760
               pinnable_frames = 10202120
           relalias_percentage = 0
                         scrub = 0
                      v_pinshm = 0
              vmm_default_pspa = 0
                vmm_klock_mode = 1
            wlm_memlimit_nonpg = 1

Open in new window


(prod):[root] / -> schedo  -F -a
             affinity_lim = 7
            big_tick_size = 1
    ded_cpu_donate_thresh = 80
         fixed_pri_global = 0
                force_grq = 0
                  maxspin = 16384
                 pacefork = 10
          proc_disk_stats = 1
                  sched_D = 16
                  sched_R = 16
            tb_balance_S0 = 2
            tb_balance_S1 = 2
             tb_threshold = 100
                timeslice = 1
          vpm_fold_policy = 1
               vpm_xvcpus = 0
##Restricted tunables
                 %usDelta = 100
          allowMCMmigrate = 0
                allow_vmx = 1
           clk_transition = 12
               fast_locks = 0
          hotlocks_enable = 0
   idle_migration_barrier = 4
            intr_stealing = 0
           jitter_control = 0
       krlock_confer2self = 1
     krlock_conferb4alloc = 1
            krlock_enable = 1
       krlock_spinb4alloc = 1
      krlock_spinb4confer = 1024
       n_idle_loop_vlopri = 100
    search_globalrq_mload = 256
     search_smtrunq_mload = 256
     setnewrq_sidle_mload = 384
      shed_primrunq_mload = 64
       sidle_S1runq_mload = 64
       sidle_S2runq_mload = 134
       sidle_S3runq_mload = 134
       sidle_S4runq_mload = 4294967040
       slock_spinb4confer = 1024
         smt_option_flags = 0
         smt_snooze_delay = 0
smt_tertiary_snooze_delay = 0
        smtrunq_load_diff = 2
    tertiary_barrier_load = 128
                tick_sync = 0
            v_exempt_secs = 2
            v_min_process = 2
              v_repage_hi = 0
            v_repage_proc = 4
               v_sec_wait = 1
       vpm_fold_threshold = 49
            vpm_min_sleep = 500

Open in new window


Paging is very low:
(prod):[root] / -> lsps -a
Page Space      Physical Volume   Volume Group Size %Used Active  Auto  Type Chksum
paging00        hdisk0            rootvg        4480MB     2   yes   yes    lv     0
hd6             hdisk0            rootvg        4480MB     1   yes   yes    lv     0

Open in new window


No change on memory, M sharing or expansion.
(prod):[root] / -> lparstat

System configuration: type=Shared mode=Uncapped smt=Off lcpu=4 mem=49152MB psize=4 ent=2.00

%user  %sys  %wait  %idle physc %entc  lbusy  vcsw phint
----- ----- ------ ------ ----- ----- ------ ----- -----
  2.6   2.3   15.2   79.9  0.10   5.0   38.0 4750446299 366508456

Open in new window


We just upgrade from AIX 5.3 to AIX6.1. And also installed a service pack for our TX serires to this one: TXSeries 6.2.0.3 s620-L120111. Told you this because I see more threads of cicsas running on the system.
0
 
LVL 68

Assisted Solution

by:woolmilkporc
woolmilkporc earned 500 total points
ID: 39723997
The column is "PGFAULTS I/O". Sorry for the confusion.

vmo -F -a

to see maxperm%.

Your last sample lparstat shows a small %sys value. Seems there are just a few processes right now (thus the high wait%), or did the problem go away?

Whatever, your first samples showed high filesystem I/O, and under such conditions it's quite normal that there is elevated kernel load, and all in all it's not a reason to get worried - the system does what it is supposed to do, after all.
0
Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

 

Author Comment

by:sminfo
ID: 39724010
No problem....;)

(prod):[root] / ->  vmo -F -a
             ame_cpus_per_pool = n/a
               ame_maxfree_mem = n/a
           ame_min_ucpool_size = n/a
               ame_minfree_mem = n/a
               ams_loan_policy = n/a
  enhanced_affinity_affin_time = 1
enhanced_affinity_vmpool_limit = 10
                esid_allocator = 0
           force_relalias_lite = 0
             kernel_heap_psize = 65536
                  lgpg_regions = 0
                     lgpg_size = 0
               low_ps_handling = 1
                       maxfree = 1088
                       maxperm = 10984708
                        maxpin = 10140042
                       maxpin% = 80
                 memory_frames = 12582912
                 memplace_data = 0
          memplace_mapped_file = 0
        memplace_shm_anonymous = 0
            memplace_shm_named = 0
                memplace_stack = 0
                 memplace_text = 0
        memplace_unmapped_file = 0
                       minfree = 960
                       minperm = 366156
                      minperm% = 3
                     nokilluid = 0
                       npskill = 17920
                       npswarn = 71680
           num_locks_per_semid = 1
                     numpsblks = 2293760
               pinnable_frames = 10201440
           relalias_percentage = 0
                         scrub = 0
                      v_pinshm = 0
              vmm_default_pspa = 0
                vmm_klock_mode = 1
            wlm_memlimit_nonpg = 1
##Restricted tunables
                  ame_hw_accel = n/a
               ame_sys_memview = n/a
                     batch_tlb = 1
                cpu_scale_memp = 8
         data_stagger_interval = 161
                         defps = 1
enhanced_affinity_attach_limit = 100
     enhanced_affinity_balance = 100
     enhanced_affinity_private = 40
      enhanced_memory_affinity = 1
                     framesets = 2
                     htabscale = n/a
                  kernel_psize = 65536
          large_page_heap_size = 0
               lru_file_repage = 0
             lru_poll_interval = 10
                     lrubucket = 131072
                    maxclient% = 90
                      maxperm% = 90
               mbuf_heap_psize = 65536
               memory_affinity = 1
          multiple_semid_lists = 0
                 munmap_npages = 16384
                     npsrpgmax = 143360
                     npsrpgmin = 107520
                   npsscrubmax = 143360
                   npsscrubmin = 107520
            num_sem_undo_lists = 0
             num_sems_per_lock = 1
              num_spec_dataseg = 0
                numperm_global = 1
             page_steal_method = 1
          psm_timeout_interval = 20000
             relalias_lockmode = 1
               relalias_nlocks = 128
                      rpgclean = 0
                    rpgcontrol = 2
                    scrubclean = 0
                shm_1tb_shared = 12
           shm_1tb_unsh_enable = 0
              shm_1tb_unshared = 256
         soft_min_lgpgs_vmpool = 0
              spec_dataseg_int = 512
              strict_maxclient = 1
                strict_maxperm = 0
                   sync_npages = 0
                 thrpgio_inval = 1024
                thrpgio_npages = 1024
               vm_mmap_areload = 0
          vm_modlist_threshold = -1
              vm_pvlist_dohard = 0
              vm_pvlist_szpcnt = 0
               vmm_fork_policy = 1
            vmm_mpsize_support = 2
               vmm_vmap_policy = 0
                  vtiol_avg_ms = 200
                  vtiol_minreq = 25
            vtiol_minth_active = 1
                    vtiol_mode = 0
               vtiol_pgin_mode = 2
              vtiol_pgout_mode = 2
               vtiol_q_cpu_pct = 2500
          vtiol_thread_cpu_pct = 5000
               wlm_rmem_filter = 0

Open in new window


No, the issue is still... see lparstat with interval of 1sec

(prod):[root] / -> lparstat 1

System configuration: type=Shared mode=Uncapped smt=Off lcpu=4 mem=49152MB psize=4 ent=2.00

%user  %sys  %wait  %idle physc %entc  lbusy  vcsw phint
----- ----- ------ ------ ----- ----- ------ ----- -----
 32.8  64.7    0.1    2.4  1.97  98.5   49.0  3636    38
 52.2  26.4    2.3   19.0  1.62  80.8   39.2  8597    72
 56.7  13.0   17.6   12.8  1.42  71.1   37.2  5379   129
 39.2  41.6    4.6   14.6  1.71  85.5   45.5 19104   236
  4.7   4.1    0.2   90.9  0.19   9.3    4.8  2052     2
 23.3  27.0    2.2   47.5  1.04  51.9   26.8  7570    16
 21.4  27.0    2.5   49.1  0.99  49.5   25.0  3949    30
 16.9  18.1    0.4   64.6  0.72  36.1   17.2  4712    20
 31.6  46.2    2.8   19.4  1.58  79.2   42.2  6259    92
 37.3  59.6    0.7    2.4  2.33 116.7   62.9 15770   219
 23.8  35.0    5.8   35.4  1.22  61.0   33.5  9346    28
 30.9  32.2    4.2   32.7  1.30  65.1   35.5 10226    41
 21.2  18.8    6.2   53.8  0.83  41.6   21.0  6017    28

Open in new window


The reason I'm worried is that system is slow and batch is delaying more than the expected hours. All this is confirmed with lpar2rrd graph I sent you... This is our production environment....;-0
0
 

Author Comment

by:sminfo
ID: 39724020
wmp.. is normal that ONLY PgOut is shown, see topas:

Topas Monitor for host:    prod             EVENTS/QUEUES    FILE/TTY
Tue Dec 17 15:43:55 2013   Interval:  2         Cswitch    8054  Readch  5443.9K
                                                Syscall  204.2K  Writech  874.4K
CPU  User%  Kern%  Wait%  Idle%  Physc          Reads      5449  Rawin         0
1     49.2   49.5    0.4    0.9   0.50          Writes     2908  Ttyout      611
0     46.1   51.2    1.1    1.7   0.37          Forks         6  Igets         0
2     40.7   40.5    1.1   17.7   0.10          Execs         6  Namei      9731
3     24.2   52.3    0.6   22.8   0.08          Runqueue    1.5  Dirblk        0
                                                Waitqueue   0.0
Network  KBPS   I-Pack  O-Pack   KB-In  KB-Out                   MEMORY
en1     883.0    361.9   881.2    29.2   853.8  PAGING           Real,MB   49152
en0      31.5     40.1    45.5     5.1    26.4  Faults     2547  % Comp     42
lo0       3.7     12.9    12.9     1.9     1.9  Steals        0  % Noncomp  56
                                                PgspIn        0  % Client   56
Disk    Busy%     KBPS     TPS KB-Read KB-Writ  PgspOut       0
hdisk11  40.0   569.0    142.0   69.0     0.0   PageIn        0  PAGING SPACE
hdisk2    0.0   394.5     86.0    0.0   394.5   PageOut     139  Size,MB    8960
hdisk9   12.0   356.9     68.0   72.5   184.4   Sios        139  % Used      1
hdisk4    0.0   120.9     20.0    0.0   120.9                    % Free     99
hdisk1    0.0    87.2     20.0    0.0    87.2   NFS (calls/sec)
hdisk0    0.0     4.0      0.0    0.0     4.0   SerV2         0  WPAR Activ     0
hdiskhea  0.0     1.2      2.0    0.7     0.5   CliV2         0  WPAR Total     0
                                                SerV3         3  Press: "h"-help
FileSystem        KBPS     TPS KB-Read KB-Writ  CliV3         0         "q"-quit
Total              4.4K    3.1K   4.1K 283.8

WLM-Class (Active)     CPU%    Mem%  Blk-I/O%
grupocics                32       4         0
grupodb2                  5      16         2

Name            PID CPU% PgSp Class
cicsas     43188278 14.7 26.0 grupocics
cicsas     47579336  7.9 28.2 grupocics
db2sysc    13238404  4.1350.2 grupodb2
oracle     21495878  4.1  6.4 grupooracle
cicsas     20775164  2.9 25.7 grupocics
p_ctmtr    23199748  2.7 12.2 grupoctm
seosd      14549030  2.7204.2 System
cicsas     33489048  1.4 29.5 grupocics
ecs.cmsg   20906122  1.3  6.4 grupooracle
rpvc_kpr    8978456  0.8  2.9 System
sfs        41484488  0.5231.8 grupocics
aioserve   61079760  0.3  0.4 System
ecs.cms    16384184  0.3 17.6 grupooracle
oracle     19136734  0.3  5.9 grupooracle
ecs.guis   21889258  0.3 64.5 grupooracle
topas      56623356  0.1  4.0 System
ecs.main    8912896  0.1 14.3 grupooracle

Open in new window

0
 
LVL 68

Assisted Solution

by:woolmilkporc
woolmilkporc earned 500 total points
ID: 39724069
PageIn = 0 over a significant amount of time is indeed strange. This would imply that there are filesystem writes to disk but no filesystem reads from disk (everything is fetched from cache).

The other values seem to confirm this, and the cswitch and namei values are still high. Your system seems to write  to tons of files just a few bytes each.

Could it be that the behaviour of TX has changed in that aspect due to the TX service pack?

I'm not familiar with TX (only with the classic Mainframe CICS), so I can't help here, sorry.
0
 

Author Closing Comment

by:sminfo
ID: 39724216
OK wmp.. thanks for your help.. I'll check Tx series Spack and also read the best practice for db2 on aix6.1 to see if there's some tunning to do here..
0

Featured Post

Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

Suggested Solutions

In tuning file systems on the Solaris Operating System, changing some parameters of a file system usually destroys the data on it. For instance, changing the cache segment block size in the volume of a T3 requires that you delete the existing volu…
I have been running these systems for a few years now and I am just very happy with them.   I just wanted to share the manual that I have created for upgrades and other things.  Oooh yes! FreeBSD makes me happy (as a server), no maintenance and I al…
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
Learn how to find files with the shell using the find and locate commands. Use locate to find a needle in a haystack.: With locate, check if the file still exists.: Use find to get the actual location of the file.:

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now