?
Solved

Response time for an application

Posted on 2006-04-25
13
Medium Priority
?
960 Views
Last Modified: 2008-03-10
Hi guys

We currently support a client which is having serious performance issues with one application. As the application is 3rd party I'd prefer not to name them, but to me, it seems the client only has had an issue since introducing a newer version. I'll try and give as much information as I can here but unfortunately I'm a little out of my depth in understanding if there are issues with the application, the AS400 or both?

Point 1
====
It is taking somewhere between 20-30 seconds for a single screen to return. I'd say this is a serious problem with the iseries but all other applications running on it are running very well.

Point 2
=====
The client has anywhere from 20-60 users interactively.

Point 3
=====
Here is a screenshot of WRKSYSSTS for the pools

% CPU used . . . . . . . :       36.3    Auxiliary storage:                  
% DB capability  . . . . :       21.1      System ASP . . . . . . :    111.6 G
Elapsed time . . . . . . :   01:12:14      % system ASP used  . . :    81.5845
Jobs in system . . . . . :      11410      Total  . . . . . . . . :    111.6 G
% perm addresses . . . . :       .028      Current unprotect used :     2731 M
% temp addresses . . . . :       .323      Maximum unprotect  . . :     2810 M
                                                                             
Type changes (if allowed), press Enter.                                      
                                                                             
System    Pool    Reserved    Max   Paging                                    
 Pool   Size (M)  Size (M)  Active  Option                                    
   1      143.49     78.40   +++++  *FIXED                                    
   2      738.30      1.07      91  *FIXED                                    
   3       45.33       .00       6  *FIXED                                    
   4      531.01      <.01      34  *FIXED                                    
   5      507.64       .01       6  *CALC      
-and-                                                                
 System    Pool    Reserved    Max                                      
  Pool   Size (M)  Size (M)  Active  Pool        Subsystem   Library    
    1      147.55     78.41   +++++  *MACHINE                          
    2      993.87       .95      91  *BASE                              
    3       78.91       .00       6  *SPOOL                            
    4      365.55       .13      34  *INTERACT                          
    5      340.50       .10       6  *SHRPOOL1        
   6       20.16       .00       6  *SHRPOOL3      
   7       69.47       .00       7  *SHRPOOL2      
                                                                   
Point 4
=====                                                      
When the user runs this job (interactively) I notice that the number of Non-DB faults jumpncrease to 11-13. I'm not sure if this is a high number. There was no wait-inel or active-inel threads at the same time.

System    Pool    Reserved    Max   -----DB-----  ---Non-DB---  
 Pool   Size (M)  Size (M)  Active  Fault  Pages  Fault  Pages  
   1      143.49     78.40   +++++     .0     .0    2.6    3.1  
   2      738.30      1.07      91     .5   21.0    3.1    9.9  
   3       45.33       .00       6     .0     .0     .2     .5  
   4      531.01      <.01      34    5.3   43.6    2.8    6.3  
   5      507.64       .01       6    2.4   32.9    1.6    2.6                                

Point 5
====
The subsystem is called LANSA. It runs in Pool 2 (*BASE).

Possible answers?
============
Should I change to *CALC for Pool2?
I noticed on WRKACTJOB that the actual job (I think it is using the Apache server) for this job that the AUXIO was 31. I'm not sure if this means anything but only seems to be a coincidence is high for the same user.
                                               --------Elapsed---------  
 Opt  Subsystem/Job  Type  Pool  Pty      CPU  Int    Rsp  AuxIO  CPU %  
      LANSA          SBS     2    0        .1                  0     .0  
        LWEB_JOB     BCH     2   19       3.6                  8     .0  
        LWEB_JOB     BCH     2   19      12.4                  0     .0  
        LWEB_JOB     BCH     2   19       7.4                  0     .0  
        LWEB_JOB     BCH     2   19       4.4                 31     .0  
        LWEB_JOB     BCH     2   19       2.9                 35     .0  
        LWEB_JOB     BCH     2   19       2.5                  0     .0  
        LWEB_JOB     BCH     2   19        .6                  0     .0  
        LWEB_JOB     BCH     2   19       5.0                  0     .0  
As I said, it is a 3rd party application, so I can't give you details on too much there. I'd just like to confirm with our client that the AS400 is running ok, or not, as the case may be. Apparently the client's internal IT have sat on this for 12 months so now it has got to an urgent status - which is where we come in. I'm giving max points to get an answer because it is difficult and urgent in a sense.

Thanks
0
Comment
Question by:jdwan
  • 6
  • 4
  • 3
13 Comments
 
LVL 13

Accepted Solution

by:
_b_h earned 1000 total points
ID: 16545398
Please post screen shots of the following commands; it is _important_ that the elapsed time be about five minutes long with the problem occurring during that time. You can use command keys F10 to restart, and F5 to refresh.
Work with System Status WRKSYSSTS
Work with Active Jobs WRKACTJOB
Work with Disk Status WRKDSKSTS
So issue the commands, get elapsed time to 0 with F10, get the user to create the problem, then F5 to refresh.

The jobs are running at priority 19 which is higher than both interactive users and batch jobs, so CPU should not be the issue. The faulting above does not look like a problem either, but output of the above commands will clarify that.

What is the model of the AS/400?

The jobs are running in the base pool; you might want to move them into a separate shared pool to get better control of them.

If this is still urgent, send me an email at the address in my profile.

Barry
0
 
LVL 27

Expert Comment

by:tliotta
ID: 16546595
jdwan:

Barry's given a good start.

His advice to move jobs out of *BASE is good. If you have LANSA jobs in *BASE, there are probably a bunch of IBM jobs also running there. With multiple jobs in *BASE, and especially with *SHARE pools also running, it's practically impossible to determine bottlenecks since jobs will be competing for the same memory pages. You can't tell which job is doing what at any point in time.

Also, the various *SHARE pools will be trying to pull memory out of the *BASE pool. (That's what *BASE is -- the base memory pool that other pools pull memory from.) Get LANSA _and_ IBM jobs into different pools and you can then see them act separately.

The WRKSYSACT command might be of some use. If you don't have the WRKSYSACT command, then Performance Tools isn't installed. You might need to view performance info through iSeries Navigator (the administrative GUI) instead.

His advice to run in 5-minute intervals (no more than 15 minutes) is so cumulative effects don't cloud the issue. You'll see only stuff that is accounted for by the activity at the time.

Nothing really stands out in the data you've supplied, but it's high-level stuff. You mentioned that other apps are running reasonably well, so it's reasonable that a high-level view would show things okay.

One very minor oddity... "% temp addresses . . . . :   .323" -- seems a _little_ higher than I'd normally expect, but you might be running various journal processes since this looks to be a decently busy system. It makes me wonder a bit more about the model of the system and the version of OS/400. Only other thing I'd wonder would be the DB2 group PTF level.

Heavy journalling could be SQL related. That can be influenced by DB2 PTFs for performance.

Barry seems on the way, so I don't have anything to add.

Tom
0
 

Author Comment

by:jdwan
ID: 16549639
Hi Barry and Tom

Thanks very much for responding so quickly. There is much wisdom in here that I need to digest so I'll feed the stats and take a looksy at your points so we can keep things rolling (hopefully). I've had to wait until the girls got in to run the app. Here are some screen shots. These stats were run for a couple of mins before they ran the programs that come to a grinding halt.

WRKACTJOB
=============================================

                             Work with Active Jobs                     NAMOI    
                                                             27/04/06  11:58:37
 CPU %:   39.1     Elapsed time:   00:06:41     Active jobs:   382              
                                                                               
 Type options, press Enter.                                                    
   2=Change   3=Hold   4=End   5=Work with   6=Release   7=Display message      
   8=Work with spooled files   13=Disconnect ...                                
                                               --------Elapsed---------        
 Opt  Subsystem/Job  Type  Pool  Pty      CPU  Int    Rsp  AuxIO  CPU %        
      LANSA          SBS     2    0        .1                  0     .0        
        LWEB_JOB     BCH     2   19       2.9                  0     .0        
        LWEB_JOB     BCH     2   19       6.0                  0     .0        
        LWEB_JOB     BCH     2   19       3.8                  0     .0        
        LWEB_JOB     BCH     2   19       9.9                  0     .0        
        LWEB_JOB     BCH     2   19       3.4                  0     .0        
        LWEB_JOB     BCH     2   19      37.7                  0     .0        
        LWEB_JOB     BCH     2   19       3.9                 17     .0        
        LWEB_JOB     BCH     2   19       4.9                128     .0        
                                                                        More...
*** See that last job, that's the one the grinds to a standstill. Below is the alternative view ***
                                                                 
 Opt  Subsystem/Job  User        Number  Type  CPU %  Threads      
      LANSA          QSYS        522432  SBS      .0        1      
        LWEB_JOB     PRECEDAWEB  522443  BCH      .0        1      
        LWEB_JOB     PRECEDAWEB  522444  BCH      .0        1      
        LWEB_JOB     PRECEDAWEB  522445  BCH      .0        1      
        LWEB_JOB     PRECEDAWEB  522446  BCH      .0        1      
        LWEB_JOB     PRECEDAWEB  522447  BCH      .0        1      
        LWEB_JOB     PRECEDAWEB  522448  BCH      .0        1      
        LWEB_JOB     PRECEDAWEB  522449  BCH      .0        1      
        LWEB_JOB     PRECEDAWEB  522450  BCH      .0        1      

WRKSYSSTS
===================================================
                            Work with System Status                    
                                                             27/04/06  11:59:57
 % CPU used . . . . . . . :       46.9    Auxiliary storage:                    
 % DB capability  . . . . :       15.9      System ASP . . . . . . :    111.6 G
 Elapsed time . . . . . . :   00:06:10      % system ASP used  . . :    82.0867
 Jobs in system . . . . . :      12155      Total  . . . . . . . . :    111.6 G
 % perm addresses . . . . :       .028      Current unprotect used :     2700 M
 % temp addresses . . . . :       .323      Maximum unprotect  . . :     2712 M
                                                                               
 Type changes (if allowed), press Enter.                                        
                                                                               
 System    Pool    Reserved    Max   -----DB-----  ---Non-DB---                
  Pool   Size (M)  Size (M)  Active  Fault  Pages  Fault  Pages                
    1      184.75     78.39   +++++     .0     .0    2.6    3.0                
    2     1042.79        .96         91     .6   48.5    7.5   27.3                
    3       48.24          .00          6     .0     .1     .0     .0                
    4      383.42       <.01      34     .8    2.4    1.3    3.0                
    5      206.29       <.01       6    2.4    6.8    4.0    7.6                
    6      101.59       .00       6     .0     .0    7.1    8.2  
    7       48.92       .00       7     .0     .2     .0     .0  
 
 System    Pool    Reserved    Max   Active->  Wait->  Active->  
  Pool   Size (M)  Size (M)  Active    Wait     Inel     Inel    
    1      184.75     78.39   +++++    1819       .0       .0    
    2     1042.79       .96      91    2560       .0       .0    
    3       48.24       .00       6     5.1       .0       .0    
    4      383.42      <.01      34    92.5       .0       .0    
    5      206.29      <.01       6      .1       .0       .0    
    6      101.59       .00       6      .3       .0       .0
    7       48.92       .00       7      .1       .0       .0

WRKDSKSTS
==========================================                                                                                                                Work with Disk Status
                                                             27/04/06  12:05:19
 Elapsed time:   00:00:00                                                      
                                                                               
              Size    %     I/O   Request   Read  Write   Read  Write    %      
 Unit  Type    (M)  Used    Rqs  Size (K)    Rqs   Rqs     (K)   (K)   Busy    
    1  6713   7516  82.1     .0       .0      .0     .0     .0     .0     0    
    2  6713   7516  82.1     .0       .0      .0     .0     .0     .0     0    
    3  6713   7516  82.1     .0       .0      .0     .0     .0     .0     0    
    4  6713   8589  82.1     .0       .0      .0     .0     .0     .0     0    
    5  6713   7516  82.1     .0       .0      .0     .0     .0     .0     0    
    6  6713   6442  82.1     .0       .0      .0     .0     .0     .0     0    
    7  6713   7516  82.1     .0       .0      .0     .0     .0     .0     0    
    8  6713   7516  82.1     .0       .0      .0     .0     .0     .0     0    
    9  6713   7516  82.1     .0       .0      .0     .0     .0     .0     0    
   10  6713   8589  82.1     .0       .0      .0     .0     .0     .0     0    
   11  6713   8589  82.1     .0       .0      .0     .0     .0     .0     0    
   12  6713   6442  82.1     .0       .0      .0     .0     .0     .0     0    
   13  6713   6442  82.1     .0       .0      .0     .0     .0     .0     0    
   14  6713   7516  82.1     .0       .0      .0     .0     .0     .0     0  
   15  6713   6442  82.2     .0       .0      .0     .0     .0     .0     0

The system model is a 9406-720, with V5R2 as the O/S.

Thanks again guys. You've given me some hope!!

regards
Jon
                                                                                                                                                         
0
Free learning courses: Active Directory Deep Dive

Get a firm grasp on your IT environment when you learn Active Directory best practices with Veeam! Watch all, or choose any amount, of this three-part webinar series to improve your skills. From the basics to virtualization and backup, we got you covered.

 
LVL 13

Expert Comment

by:_b_h
ID: 16549698
Hi, Jon

For the WRKDSKSTS, the elapsed time is 00:00, so that needs another look. We want to check the arm busy levels to see if one disk arm is really high, like 40%. While they are having the problem, refresh WRKACTJOB and look at the status of that problem job; we are looking for anything unusual, like a status of LCKW.

Try the WRKSYSACT command to see if you have performance tools installed.

Use the WRKPTFGRP command and post the results as well, so we can see your PTF levels.

Barry

0
 

Author Comment

by:jdwan
ID: 16549808
Yes, sorry Barry. I ran this snapshot with the girls running the problem and nothing out of the ordinary happened on the arms, so they appear to be humming ok (although I would like their administrator to get the % used down - another story). I also checked the active job and the job status never changed the whole time (from DEQW). Not a bad suggestion though.

Looks as though they have loaded performance tools as WRKSYSACT works.

                             Work with Disk Status                        
                                                             27/04/06  12:46:59
 Elapsed time:   00:05:36                                                      
                                                                               
              Size    %     I/O   Request   Read  Write   Read  Write    %      
 Unit  Type    (M)  Used    Rqs  Size (K)    Rqs   Rqs     (K)   (K)   Busy    
    1  6713   7516  82.1   18.4      5.9     2.5   15.8    8.1    5.5     6    
    2  6713   7516  82.1    7.0      5.3     1.4    5.5    8.0    4.6     3    
    3  6713   7516  82.1    5.3      6.1     1.4    3.9    9.4    4.9     3    
    4  6713   8589  82.1    9.8      7.4     2.2    7.6    6.9    7.6     3    
    5  6713   7516  82.1    6.2      6.3     1.6    4.6   10.5    4.8     3    
    6  6713   6442  82.2    5.2      6.3     1.4    3.8    7.1    6.0     1    
    7  6713   7516  82.1    7.0      6.0     1.1    5.9    9.2    5.4     2    
    8  6713   7516  82.2    6.9      6.0     1.3    5.5    9.2    5.2     3    
    9  6713   7516  82.1    7.1      6.8     1.7    5.4    8.4    6.3     3    
   10  6713   8589  82.2    7.8      5.7      .9    6.9   10.4    5.1     1    
   11  6713   8589  82.2    7.0      5.8     1.2    5.8    9.9    4.9     3    
   12  6713   6442  82.2   10.3      5.4     1.1    9.2    8.8    4.9     3    
   13  6713   6442  82.1    7.4      6.1     1.1    6.2   10.1    5.4     4    
etc
===============================
Here is the PTF group display.

                              Work with PTF Groups                            
 Type options, press Enter.                                                    
   4=Delete   5=Display   6=Print   9=Display related PTF groups              
                                                                               
 Opt  PTF Group             Level  Status                                      
      SF99519                 149  Installed                                  
      SF99502                  21  Installed                                  

Thanx again for quick response
Jon                                                                              
0
 
LVL 13

Expert Comment

by:_b_h
ID: 16549953
Hi, Jon

PTFs look reasonably current. DB2 group SF99502 is available at level 22. Hiper group SF99519 is available at level 160.
You should use DSPPTF to confirm that the cume is installed, and that there are no PTFs with an IPL Action.
You may consider ordering the latest cume/hiper/db2 package. If you have IBM support, you can ask them for performance related PTFs, especially for your problem application.

Since Performance Tools are installed, use GO PERFORM, then option 2 for Collect performance data, and you can see if the system is collecting performance data. If it is, you can use DSPPFRDTA to view this data or print reports from option 3 (start with the component report).

Barry
0
 

Author Comment

by:jdwan
ID: 16550027
Thanks Barry. Can you explain what the levels mean in this screen? This business does have IBM support so I'll ask their IT to order the PTFs for Lansa.

Also, will collecting performance data have any impact on the performance of the machine. ie is it a catch-22. Not that this is the case here I don't think - just out of curiosity.
0
 
LVL 13

Expert Comment

by:_b_h
ID: 16550091
The levels just indicate how current the PTFs are. Tell them to order the latest cumulative package, which will come with hipers and db2 groups included. The latest cumulative package for Version 5 Release 2.0 is C6080520, where 6080 is sort of julian date of 2006/03/21.

Collecting performance data is pretty safe on this system. It has enough capacity to handle this. The performance impact occurs when the data is dumped at the end of an interval, a small price to pay if it gives you the clue to the existing problem.

Performance data can be collected with trace data included. This uses more resources, but supplies more detail about what is happening on the system. If you use the trace option, start the monitor for a short time such as 30 minutes, with 10 minutes per interval, and re-create the problem during the middle interval.

Another method of collecting detailed info is to use Start Service Job (STRSRVJOB) and then Trace Job (TRCJOB). Be sure to turn off the trace when you are done.

If you suspect an SQL problem, Operations Navigator in Client Access allows SQL monitors to be started against jobs. It collects detailed information on access plans, etc.  Expand Databases, then <database-name>, and select SQL Performance monitor.

Barry
0
 

Author Comment

by:jdwan
ID: 16550284
Thanks Barry. I don't think this application uses SQL but I can't guarantee that. I do know they use Java which seems to be missing the latest cume package so I'll get onto them for this and get back to you when this has been applied.

You and Tom indicated we should change the LANSA subsystem out of *BASE. Can you suggest where I put it? eg one of the sharepools and what should the config be for those if I've understood this correctly.

BTW, there is a lot of journaling that is going on this machine so you are correct there Tom. I'll see if I can get them to remove some of those journal receivers but this is only an aside.
0
 
LVL 27

Assisted Solution

by:tliotta
tliotta earned 1000 total points
ID: 16558607
Jon:

Generally, the DB2 Group PTF affects SQL, but it can do much more since the underlying database engine drives all DB access. IBM started putting these into a DB2 Group a few years ago because a lot of shops didn't use SQL. By moving them out of the cume package, all the shops without SQL didn't have to apply monster cumes -- the cumes only had what was common on all systems.

But SQL became the DB standard and IBM started using it in their own code. (If you run some ManagementCentral APIs, for example, you'll see SQL being used by messages in the joblog.) It seems that IBM has simply started putting a bigger set of PTFs in the DB2 Group. Lots of performance PTFs for database are in there regardless of what interface is used. Systems can be improved whether SQL is used or not. And some fixes are important because SQL gets used by internal code.

As for where to put LANSA... I'd put it in *SHRPOOL4. That doesn't seem to be in use, so that would isolate it for later review. That means that (1) some memory should be assigned to *SHRPOOL4 via WRKSHRPOOL and (2) it needs to be associated with whatever subsytem the LANSA jobs route through, which seems to be named LANSA.

But note that that may be only part of the problem. Use the WRKSBS command to see overall how various system pools are distributed for work in your subsystems. The list of subsystems probably shows system pool #2 (*BASE) as the first pool for almost every subsystem. What you'd like to see is system pool #2 as the first subsystem pool _plus_ a second system pool for all of your subsystems  except QCTL perhaps.

The second pool for each is the one you would probably want to route work to. The routing is done by "routing entries" for each subsystem and that's where it gets detailed. IBM has a whole bunch of routing entries for their TCP/IP and host server jobs in QSERVER, QSYSWRK and QUSRWRK, and all of them route into whatever pool is defined first for each subsystem. They also have a bunch of pre-start job entries that also do a kind of routing.

By default, that implies that IBM is trying to make *BASE a useless pool for any performance adjustment since *BASE is the default configuration for most subsystems they ship out. (Then again, they're happy to sell you either performance tuning services or more memory/faster processors; why would they want to ship systems that were already tuned?)

By adding a second pool to subsystems, you get to route your work to pools you can tune; but you also allow true system stuff such as the subsystem monitor jobs to remain in *BASE since they can't be routed. (You could just change the first pool to be *SHRPOOLx instead of *BASE, but then true system jobs would be competing with application jobs and things would get more complicated. Having a single pool causes the subsystem monitor to run in that pool, i.e., in the first one listed for a subsystem.)

In short, don't run LANSA in a pool that's already in use. Use a new one.

Various TCP/IP and host server jobs can be handled as a separate step later.

If nothing else, by getting some heavy work out of *BASE, you're beginning to let the automatic performance tuning do its work. Auto-tuning _requires_ *BASE to have spare memory. *BASE is where the tuner moves memory from when it assigns additional memory to another shared pool. Jobs running in *BASE interfere with that. (Hmmm... do you have auto-tuning turned on?)

As far as journal receivers go, no need to delete them. An operation such as Receive Journal Entry will use one temporary address for each operation. You've only used 0.3% of the number available, so it's not like you'll run out soon. There are a _LOT_ of temp addresses available. It was just that it was possible that you'd been running an older version of OS/400 for a long time and I wanted to discard that possibility.

Tom
0
 

Author Comment

by:jdwan
ID: 16558723
Thanks Tom. Makes a lot of sense and I'll setup LANSA sbs to use SHRPOOL4, and see how we go. Once again, the girls are away until Tue/Wed next week so I'll need to sit tight until then but will post the outcomes from that point. By Auto Tuning, I assume you are referring to QPFRADJ system value - it is currently set to 2.

Jon

0
 

Author Comment

by:jdwan
ID: 16602528
The problem turned out to be related to the software provider had mapped a path for a logo which no longer existed. Looks to us like it was trying to render a page but didn't know the size of the logo, hence the long delays. Looks like a Java program.

Anyway, learnt a lot about pools thanks to Barry and Tom. Thanks guys and I'll split the points given none of this made 1 iota's difference

(I did change the subsystem to point to the shared pool also and didn't make too much difference either way performance wise).
0
 
LVL 27

Expert Comment

by:tliotta
ID: 16608566
Jon:

Yeah, it wasn't so much to make a difference as to get configured to find problems. Thanks for the update -- now I'm going to look at how Java functions show up in tools such as WRKSYSACT because you highlighted it. It hasn't been a focus before because we all "know" Java can be a performance issue anyway. Nice to have a clue from you on possible things to explore.

Tom
0

Featured Post

Windows Server 2016: All you need to know

Learn about Hyper-V features that increase functionality and usability of Microsoft Windows Server 2016. Also, throughout this eBook, you’ll find some basic PowerShell examples that will help you leverage the scripts in your environments!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I came across an unsolved Outlook issue and here is my solution.
Moore’s Law has proven itself time and time again since it was first introduced. So what’s next? Will Moore’s law continue to remain relevant, or will new technology take over and bring us the next big advancement in computing?
Exchange organizations may use the Journaling Agent of the Transport Service to archive messages going through Exchange. However, if the Transport Service is integrated with some email content management application (such as an anti-spam), the admin…
Is your OST file inaccessible, Need to transfer OST file from one computer to another? Want to convert OST file to PST? If the answer to any of the above question is yes, then look no further. With the help of Stellar OST to PST Converter, you can e…
Suggested Courses

850 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question