RVIT
asked on
Performance Tuning 9406 720 206A running V5R3M0 - Ran fine on V5R1M0 but when scratch installed with V5R3M0 running like a lame horse
we recently upgraded our 720 from V5R1 to V5R3 but there were problems with the upgrade and we had to do a scratch install. Since installing V5R3 its performance is abissmal. A job I run on the 820 that takes about 4 hours takes 3 weeks on the 720 and is still running. I have little knowledge of performance tuning and have been running the system with Performance Adjustment set to 2 (Adjustment at IPL and automatic adjustment).
This afternoon i've tried changing the pools from *FIXED to *CALC but its not made any difference.
WRKSHRPOOL shows this:
Defined Max Allocated Pool -Paging Option--
Pool Size (M) Active Size (M) ID Defined Current
*MACHINE 104.09 +++++ 104.09 1 *FIXED *FIXED
*BASE 133.32 45 133.32 2 *CALC *CALC
*INTERACT 12.79 5 12.79 3 *CALC *CALC
*SPOOL 2.55 5 2.55 4 *FIXED *FIXED
*SHRPOOL1 .00 0 *FIXED
*SHRPOOL2 3.23 1 3.23 5 *CALC *CALC
Our main application (Island Pacific) has multiple job queues and subsystems, all of which run through the *BASE pool. As this is the test system, i am the only user on it so its not the interactive that is eating the performance. I am at a complete loss to explain why this is happening.
Here is the WRKSYSSTS Screen:
System Pool Reserved Max -----DB----- ---Non-DB---
Pool Size (M) Size (M) Active Fault Pages Fault Pages
1 104.41 52.45 +++++ .0 .0 45.5 47.7
2 133.66 .83 45 1.1 8.5 74.6 94.7
3 12.79 .00 5 .0 .0 .5 1.3
4 2.55 .00 5 .0 .0 .3 .9
5 2.55 .00 1 .0 .0 .0 .0
This afternoon i've tried changing the pools from *FIXED to *CALC but its not made any difference.
WRKSHRPOOL shows this:
Defined Max Allocated Pool -Paging Option--
Pool Size (M) Active Size (M) ID Defined Current
*MACHINE 104.09 +++++ 104.09 1 *FIXED *FIXED
*BASE 133.32 45 133.32 2 *CALC *CALC
*INTERACT 12.79 5 12.79 3 *CALC *CALC
*SPOOL 2.55 5 2.55 4 *FIXED *FIXED
*SHRPOOL1 .00 0 *FIXED
*SHRPOOL2 3.23 1 3.23 5 *CALC *CALC
Our main application (Island Pacific) has multiple job queues and subsystems, all of which run through the *BASE pool. As this is the test system, i am the only user on it so its not the interactive that is eating the performance. I am at a complete loss to explain why this is happening.
Here is the WRKSYSSTS Screen:
System Pool Reserved Max -----DB----- ---Non-DB---
Pool Size (M) Size (M) Active Fault Pages Fault Pages
1 104.41 52.45 +++++ .0 .0 45.5 47.7
2 133.66 .83 45 1.1 8.5 74.6 94.7
3 12.79 .00 5 .0 .0 .5 1.3
4 2.55 .00 5 .0 .0 .3 .9
5 2.55 .00 1 .0 .0 .0 .0
ASKER
Thanks for the quick response.
there's only 1 user on this machine as its the test box (me).
here's the WRKSYSACT:
Job or CPU Sync Async CPU
Task User Number Thread Pty Util I/O I/O Util
CFINT01 0 5.2 0 0 .0
QPMHDWRC QSYS 013548 00000021 1 4.7 302 2 .0
QDBSRVXR2 QSYS 012545 00000001 0 1.1 47 71 .0
SMPOL001 99 1.0 0 992 .0
IOSTATSTAS 0 .7 3 0 .0
IP200921 SW01 013526 00000005 50 .6 284 0 .0
The job i'm trying to run is the IP200921.
there's only 1 user on this machine as its the test box (me).
here's the WRKSYSACT:
Job or CPU Sync Async CPU
Task User Number Thread Pty Util I/O I/O Util
CFINT01 0 5.2 0 0 .0
QPMHDWRC QSYS 013548 00000021 1 4.7 302 2 .0
QDBSRVXR2 QSYS 012545 00000001 0 1.1 47 71 .0
SMPOL001 99 1.0 0 992 .0
IOSTATSTAS 0 .7 3 0 .0
IP200921 SW01 013526 00000005 50 .6 284 0 .0
The job i'm trying to run is the IP200921.
RVIT:
First principle -- Don't run jobs in *BASE. Simple.
There is no decent probability that performance adjuster can do useful adjusting if jobs are actively using the *BASE memory pool. This includes server jobs along with everything else.
As far as memory goes, the purpose of the adjuster is to move memory out of *BASE into pools that need it and to move memory out of pools that don't need it back into *BASE. If the memory is being used _while_ it's in *BASE, then all you're doing is increasing CPU utilization by running the adjuster in addition to everything else.
Although memory movement still occurs, the result is pretty much nothing but shifting it back and forth between *BASE and another pool while paging alternative jobs in and out. You'd be better off with no adjusting.
*BASE is supposed to be the leftover memory that isn't needed. By running jobs in that space, you are making it far more difficult to determine whether any given memory is sufficient.
I suspect that this isn't highly publicized by IBM because they can make money by (1) selling you memory and (2) selling you performance tuning services.
In order to do any serious performance tuning, the system must be able to make useful performance measurements for you. To get that started, you'll need to get jobs out of *BASE. You already have a couple additional shared pools created; you might need a couple more.
Run through each subsystem and review the pool assignments. All of them need pools other than *BASE; *BASE can be subsystem pool #1, but don't route anything to it. If *BASE is associated with a subsystem, the subsystem monitor job will run there.
To avoid running other jobs there, review each subsystem's prestart job and routing entries. NONE of those in ANY subsystem should refer to a subsystem pool that points to *BASE.
I generally set one shared pool for TCP/IP server jobs and a second one for the host server jobs. I create one or two others for my batch jobs, giving at least three shared pools. *INTERACT and *SPOOL are already created by default, so those can be looked at much later.
Once the pools exist, I start changing the prestart job and routing entries to start jobs in pools that can be watched (and adjusted!) This can take a couple days, especially if the system is fully active at the time.
Once jobs have settled into their new pools, you can then start watching for any adjustments that occur as well as for hot spots. If a bunch of stuff is in *BASE, it's impossible for you to know where any issues are -- everything is competing for the same memory, even jobs in other pools want to steal that memory.
Once you're in that state, you can do some real tuning and your adjuster might actually help.
However, that doesn't help you today.
For today, first thing you need to do is make sure your group PTFs are up to date, especially DB2 group. I'm not sure what the current level is for V5R3 but it's at least at level 3.
With Island Pacific (which I _know_ could use some software tuning), I'd also want to be sure that my cume PTF level was current as well as my HI/PER level. Then, I'd go searching for additional performance related PTFs that aren't included in any cume or group package.
Once my system was at a premium level and performance was an issue, for Island Pacific, I'd check to see if any exit programs are registered against... hmmm... I think they use either the data queue or distributed program call/remote command host servers to a very high degree. If an exit program is registered against either of those, I'd remove it to see any results.
If that makes no difference, then it's time for serious investigation.
Tom
First principle -- Don't run jobs in *BASE. Simple.
There is no decent probability that performance adjuster can do useful adjusting if jobs are actively using the *BASE memory pool. This includes server jobs along with everything else.
As far as memory goes, the purpose of the adjuster is to move memory out of *BASE into pools that need it and to move memory out of pools that don't need it back into *BASE. If the memory is being used _while_ it's in *BASE, then all you're doing is increasing CPU utilization by running the adjuster in addition to everything else.
Although memory movement still occurs, the result is pretty much nothing but shifting it back and forth between *BASE and another pool while paging alternative jobs in and out. You'd be better off with no adjusting.
*BASE is supposed to be the leftover memory that isn't needed. By running jobs in that space, you are making it far more difficult to determine whether any given memory is sufficient.
I suspect that this isn't highly publicized by IBM because they can make money by (1) selling you memory and (2) selling you performance tuning services.
In order to do any serious performance tuning, the system must be able to make useful performance measurements for you. To get that started, you'll need to get jobs out of *BASE. You already have a couple additional shared pools created; you might need a couple more.
Run through each subsystem and review the pool assignments. All of them need pools other than *BASE; *BASE can be subsystem pool #1, but don't route anything to it. If *BASE is associated with a subsystem, the subsystem monitor job will run there.
To avoid running other jobs there, review each subsystem's prestart job and routing entries. NONE of those in ANY subsystem should refer to a subsystem pool that points to *BASE.
I generally set one shared pool for TCP/IP server jobs and a second one for the host server jobs. I create one or two others for my batch jobs, giving at least three shared pools. *INTERACT and *SPOOL are already created by default, so those can be looked at much later.
Once the pools exist, I start changing the prestart job and routing entries to start jobs in pools that can be watched (and adjusted!) This can take a couple days, especially if the system is fully active at the time.
Once jobs have settled into their new pools, you can then start watching for any adjustments that occur as well as for hot spots. If a bunch of stuff is in *BASE, it's impossible for you to know where any issues are -- everything is competing for the same memory, even jobs in other pools want to steal that memory.
Once you're in that state, you can do some real tuning and your adjuster might actually help.
However, that doesn't help you today.
For today, first thing you need to do is make sure your group PTFs are up to date, especially DB2 group. I'm not sure what the current level is for V5R3 but it's at least at level 3.
With Island Pacific (which I _know_ could use some software tuning), I'd also want to be sure that my cume PTF level was current as well as my HI/PER level. Then, I'd go searching for additional performance related PTFs that aren't included in any cume or group package.
Once my system was at a premium level and performance was an issue, for Island Pacific, I'd check to see if any exit programs are registered against... hmmm... I think they use either the data queue or distributed program call/remote command host servers to a very high degree. If an exit program is registered against either of those, I'd remove it to see any results.
If that makes no difference, then it's time for serious investigation.
Tom
ASKER
Thanks Tom for your in-depth reply - i will work through this over the next couple of days and let you know. Its a DEV box so only me on it, so as far as changing things goes i've got no restrictions...
RVIT:
Just stay aware that there are numerous performance-related PTFs that will never be in a cume PTF package nor in a group PTF package. PTFs might only affect customers with software packages such as Island Pacific -- IBM won't include those in PTF packages that would go to all of IBM's customers.
Oh, also, I wouldn't call my reply "in-depth" yet. So far, it's only been an overview of how to get to a point where it's possible to track performance issues. Actually doing something with the info hasn't even begun yet, heh.
Good luck.
Tom
Just stay aware that there are numerous performance-related PTFs that will never be in a cume PTF package nor in a group PTF package. PTFs might only affect customers with software packages such as Island Pacific -- IBM won't include those in PTF packages that would go to all of IBM's customers.
Oh, also, I wouldn't call my reply "in-depth" yet. So far, it's only been an overview of how to get to a point where it's possible to track performance issues. Actually doing something with the info hasn't even begun yet, heh.
Good luck.
Tom
ASKER
I didnt know that - i thought a CUM package was every fix - oh well, off to investigate how i find the missing PTF's then!
RVIT:
There are large numbers of PTFs that aren't in cume or group packages. Some may be specific to particular hardware -- one model IOP might need a PTF that would be disastrous for a different model. Some may be specific to licensed program products -- a SQL PTF might be specific to an interface to the SQL compiler preprocessor and only be valid if the SQL Dev Kit is installed. (I made up those examples; they might not make sense. Just to illustrate.)
IBM has commonly built the cume packages from PTFs that have wide application across the customer base. If you review the PSP report of PTF Summaries, take note of the large number of PTFs that always are listed as being in cume package '1000' -- that indicates it's not in any cume package (yet).
Some of those are later chosen to become part of a package. I think it's partly based on how many customers report problems that match that PTF's symptom string among other things.
Keeping the size of a cumulative package down is one goal. Keeping the complexity of the install down is another. Avoiding unintended consequences of interactions between PTFs is another. Probably other reasons.
Tom
There are large numbers of PTFs that aren't in cume or group packages. Some may be specific to particular hardware -- one model IOP might need a PTF that would be disastrous for a different model. Some may be specific to licensed program products -- a SQL PTF might be specific to an interface to the SQL compiler preprocessor and only be valid if the SQL Dev Kit is installed. (I made up those examples; they might not make sense. Just to illustrate.)
IBM has commonly built the cume packages from PTFs that have wide application across the customer base. If you review the PSP report of PTF Summaries, take note of the large number of PTFs that always are listed as being in cume package '1000' -- that indicates it's not in any cume package (yet).
Some of those are later chosen to become part of a package. I think it's partly based on how many customers report problems that match that PTF's symptom string among other things.
Keeping the size of a cumulative package down is one goal. Keeping the complexity of the install down is another. Avoiding unintended consequences of interactions between PTFs is another. Probably other reasons.
Tom
ASKER
Hi,
Have loaded all the service packs i can find and created the pools as suggested.
The Batch job in question that seems to be running really slowly is only using about 1 - 2 % of the CPU in WRKACTJOB yet there is nothing else running. I dont understand why it is not making use of the full CPU power.
Here is my pool status now (the Island Pacific job is running though system pool 4 (shrdpool1):
System Pool Reserved Max -----DB----- ---Non-DB---
Pool Size (M) Size (M) Active Fault Pages Fault Pages
1 78.65 50.56 +++++ .0 .0 81.6 84.0
2 121.98 .83 55 1.2 1.6 8.6 25.9
3 12.79 .00 5 .0 .0 .5 .8
4 40.00 .00 10 .0 .0 66.3 66.3
5 2.55 .00 5 .0 .0 .0 .0
as you can see the Non-DB Faults are quite high?
Help!
Have loaded all the service packs i can find and created the pools as suggested.
The Batch job in question that seems to be running really slowly is only using about 1 - 2 % of the CPU in WRKACTJOB yet there is nothing else running. I dont understand why it is not making use of the full CPU power.
Here is my pool status now (the Island Pacific job is running though system pool 4 (shrdpool1):
System Pool Reserved Max -----DB----- ---Non-DB---
Pool Size (M) Size (M) Active Fault Pages Fault Pages
1 78.65 50.56 +++++ .0 .0 81.6 84.0
2 121.98 .83 55 1.2 1.6 8.6 25.9
3 12.79 .00 5 .0 .0 .5 .8
4 40.00 .00 10 .0 .0 66.3 66.3
5 2.55 .00 5 .0 .0 .0 .0
as you can see the Non-DB Faults are quite high?
Help!
What is the status of the job under WrkActJob
ie IDX-MYdbf
Dave
ie IDX-MYdbf
Dave
Ps
do a wrksbsd and check
what storage pools are allocated to it (option 2)
what the routing enties and associated class (Option 7 then option 5)
dave
do a wrksbsd and check
what storage pools are allocated to it (option 2)
what the routing enties and associated class (Option 7 then option 5)
dave
ASKER
IPTS QSYS SBS .0 DEQW
IPMSGQ IPTS ASJ .0 PGM-IPMSGQ MSGW
IP200921 SW01 BCH 1.0 PGM-IP009CP RUN
its the IP200921 job.
Subsystem description: IPTS
Pool Storage Activity
ID Size (K) Level
1 *SHRPOOL1
Opt Seq Nbr Program Library Compare Value
9999 QCMD QSYS *ANY
Routing entry sequence number . . . . . . . : 9999
Program . . . . . . . . . . . . . . . . . . : QCMD
Library . . . . . . . . . . . . . . . . . : QSYS
Class . . . . . . . . . . . . . . . . . . . : IPTS
Library . . . . . . . . . . . . . . . . . : IPTSPGM
Maximum active routing steps . . . . . . . : *NOMAX
Pool identifier . . . . . . . . . . . . . . : 1
Compare value . . . . . . . . . . . . . . . : *ANY
Compare start position . . . . . . . . . . :
Thread resources affinity:
Group . . . . . . . . . . . . . . . . . . : *SYSVAL
Level . . . . . . . . . . . . . . . . . . :
Resources affinity group . . . . . . . . . : *NO
Hope this helps - Thanks Dave!
IPMSGQ IPTS ASJ .0 PGM-IPMSGQ MSGW
IP200921 SW01 BCH 1.0 PGM-IP009CP RUN
its the IP200921 job.
Subsystem description: IPTS
Pool Storage Activity
ID Size (K) Level
1 *SHRPOOL1
Opt Seq Nbr Program Library Compare Value
9999 QCMD QSYS *ANY
Routing entry sequence number . . . . . . . : 9999
Program . . . . . . . . . . . . . . . . . . : QCMD
Library . . . . . . . . . . . . . . . . . : QSYS
Class . . . . . . . . . . . . . . . . . . . : IPTS
Library . . . . . . . . . . . . . . . . . : IPTSPGM
Maximum active routing steps . . . . . . . : *NOMAX
Pool identifier . . . . . . . . . . . . . . : 1
Compare value . . . . . . . . . . . . . . . : *ANY
Compare start position . . . . . . . . . . :
Thread resources affinity:
Group . . . . . . . . . . . . . . . . . . : *SYSVAL
Level . . . . . . . . . . . . . . . . . . :
Resources affinity group . . . . . . . . . : *NO
Hope this helps - Thanks Dave!
Hi
the class is a non-standard class
and you have no memory in *shrpool1
first lets get some memory into the subsystem
do a
CHGSBSD SBSD(IPTS) POOLS((2 *BASE))
thel see what difference that makes.
Dave
the class is a non-standard class
and you have no memory in *shrpool1
first lets get some memory into the subsystem
do a
CHGSBSD SBSD(IPTS) POOLS((2 *BASE))
thel see what difference that makes.
Dave
ASKER
Done that:
IPTS QSYS SBS .0 DEQW
IP200921 SW01 BCH 1.7 PGM-IP009CP RUN
IPTS QSYS SBS .0 DEQW
IP200921 SW01 BCH 1.7 PGM-IP009CP RUN
Hi
what does the subsystem description say now?
also do a
DSPCLS IPTSPGM/IPTS
what does the subsystem description say now?
also do a
DSPCLS IPTSPGM/IPTS
ASKER
Subsystem description: IPTS
Pool Storage Activity
ID Size (K) Level
1 *SHRPOOL1
2 *BASE
Class . . . . . . . . . . . . . . . . . . . . . . : IPTS
Library . . . . . . . . . . . . . . . . . . . . : IPTSPGM
Run priority . . . . . . . . . . . . . . . . . . : 50
Time slice in milliseconds . . . . . . . . . . . : 10000
Eligible for purge . . . . . . . . . . . . . . . : *NO
Default wait time in seconds . . . . . . . . . . : 600
Maximum CPU time in milliseconds . . . . . . . . : *NOMAX
Maximum temporary storage in megabytes . . . . . : *NOMAX
Maximum threads . . . . . . . . . . . . . . . . . : *NOMAX
Text . . . . . . . . . . . . . . . . . . . . . . : CLS for Island Pacific jo
Cheers Dave!
Pool Storage Activity
ID Size (K) Level
1 *SHRPOOL1
2 *BASE
Class . . . . . . . . . . . . . . . . . . . . . . : IPTS
Library . . . . . . . . . . . . . . . . . . . . : IPTSPGM
Run priority . . . . . . . . . . . . . . . . . . : 50
Time slice in milliseconds . . . . . . . . . . . : 10000
Eligible for purge . . . . . . . . . . . . . . . : *NO
Default wait time in seconds . . . . . . . . . . : 600
Maximum CPU time in milliseconds . . . . . . . . : *NOMAX
Maximum temporary storage in megabytes . . . . . : *NOMAX
Maximum threads . . . . . . . . . . . . . . . . . : *NOMAX
Text . . . . . . . . . . . . . . . . . . . . . . : CLS for Island Pacific jo
Cheers Dave!
Hi
can you end the subsystem then enter
CHGSBSD SBSD(IPTS) POOLS((1 *RMV))
then re-start it and try to run the job again.
Dave
can you end the subsystem then enter
CHGSBSD SBSD(IPTS) POOLS((1 *RMV))
then re-start it and try to run the job again.
Dave
ASKER
Message ID . . . . . . : CPD1509
Date sent . . . . . . : 11/04/05 Time sent . . . . . . : 15:02:18
Message . . . . : Pool definition 1 was not removed.
Cause . . . . . : Pool definition 1 cannot be removed because it is
specified in one or more subsystem description entries.
Recovery . . . : Do one of the following and try the request again:
-- Remove the subsystem description entries using the Remove Prestart Job
Entries (RMVPJE) command or the Remove Routing Entry (RMVRTGE) command that
specifies the pool definition.
-- Change the pool definition that is specified in the subsystem
description entries (POOLID parameter).
However,
i have removed pool 2 and changed pool 1 to *BASE:
Pool Storage Activity
ID Size (K) Level
1 *BASE
is this what you actually wanted?
(no difference from looking at it - isnt this what we had when we started?)
Date sent . . . . . . : 11/04/05 Time sent . . . . . . : 15:02:18
Message . . . . : Pool definition 1 was not removed.
Cause . . . . . : Pool definition 1 cannot be removed because it is
specified in one or more subsystem description entries.
Recovery . . . : Do one of the following and try the request again:
-- Remove the subsystem description entries using the Remove Prestart Job
Entries (RMVPJE) command or the Remove Routing Entry (RMVRTGE) command that
specifies the pool definition.
-- Change the pool definition that is specified in the subsystem
description entries (POOLID parameter).
However,
i have removed pool 2 and changed pool 1 to *BASE:
Pool Storage Activity
ID Size (K) Level
1 *BASE
is this what you actually wanted?
(no difference from looking at it - isnt this what we had when we started?)
Hi
I don't think so. I can not see any reference to changing the sub system!
Just looking through the thred I can see that you have some sharepools.
Defined Max Allocated Pool -Paging Option--
Pool Size (M) Active Size (M) ID Defined Current
*MACHINE 104.09 +++++ 104.09 1 *FIXED *FIXED
*BASE 133.32 45 133.32 2 *CALC *CALC
*INTERACT 12.79 5 12.79 3 *CALC *CALC
*SPOOL 2.55 5 2.55 4 *FIXED *FIXED
*SHRPOOL1 .00 0 *FIXED <<==============
*SHRPOOL2 3.23 1 3.23 5 *CALC *CALC
if we look ate your subsystem description before we made the changes we have
Subsystem description: IPTS
Pool Storage Activity
ID Size (K) Level
1 *SHRPOOL1 <===============
as you can see there is no memoery allocated to the share hence no memory allocated to the subsystem.
I have made so quick and nasty changes to get *base memory into the subsystem so at lease the OS has some memory to play with. If we start to get some performance improvement then we can play with memory allocation later.
I would expect to see a bit more CPU utilisation on the new config.
Dave
I don't think so. I can not see any reference to changing the sub system!
Just looking through the thred I can see that you have some sharepools.
Defined Max Allocated Pool -Paging Option--
Pool Size (M) Active Size (M) ID Defined Current
*MACHINE 104.09 +++++ 104.09 1 *FIXED *FIXED
*BASE 133.32 45 133.32 2 *CALC *CALC
*INTERACT 12.79 5 12.79 3 *CALC *CALC
*SPOOL 2.55 5 2.55 4 *FIXED *FIXED
*SHRPOOL1 .00 0 *FIXED <<==============
*SHRPOOL2 3.23 1 3.23 5 *CALC *CALC
if we look ate your subsystem description before we made the changes we have
Subsystem description: IPTS
Pool Storage Activity
ID Size (K) Level
1 *SHRPOOL1 <===============
as you can see there is no memoery allocated to the share hence no memory allocated to the subsystem.
I have made so quick and nasty changes to get *base memory into the subsystem so at lease the OS has some memory to play with. If we start to get some performance improvement then we can play with memory allocation later.
I would expect to see a bit more CPU utilisation on the new config.
Dave
ASKER
ok thanks dave - will switch on qpfradj and see what happens :)
Note... if memory is needed in a shared pool, use CHGSHRPOOL or WRKSHRPOOL:
==> chgshrpool *shrpool1 size( 4096 )
...would shift 4MB from the *BASE pool into shared pool 1. (Size is specified as increments of kilo-bytes.) Also, not only is memory needed, but activity levels may also be needed. E.g.:
==> chgshrpool *shrpool1 size( 4096 ) +
actlvl( 3 )
Hard to tell what a decent activity level is yet since we don't know what functions will be running in the pool. Also hard to know how much memory to add.
By turning performance adjuster on, we can check after 15 min have gone by to see what adjustments have been made.
WRKSHRPOOL provides access to tuning parameters by pressing <F11=Display tuning data>.
Don't expect to tune properly immediately. Initial settings are pure guesswork. Only after watching how interactions with other jobs change things will you start getting better.
And careful putting *BASE back in as a subsystem memory pool. That sends you right back where you started.
Tom
==> chgshrpool *shrpool1 size( 4096 )
...would shift 4MB from the *BASE pool into shared pool 1. (Size is specified as increments of kilo-bytes.) Also, not only is memory needed, but activity levels may also be needed. E.g.:
==> chgshrpool *shrpool1 size( 4096 ) +
actlvl( 3 )
Hard to tell what a decent activity level is yet since we don't know what functions will be running in the pool. Also hard to know how much memory to add.
By turning performance adjuster on, we can check after 15 min have gone by to see what adjustments have been made.
WRKSHRPOOL provides access to tuning parameters by pressing <F11=Display tuning data>.
Don't expect to tune properly immediately. Initial settings are pure guesswork. Only after watching how interactions with other jobs change things will you start getting better.
And careful putting *BASE back in as a subsystem memory pool. That sends you right back where you started.
Tom
Hi Tom
I was just trying a few things out - just to see if there was a memory issue. The CPU was very low.
Once we can get to CPU utilisation then play with the pools - but since this is a single user box then runnung from *BASE should not have any realy implications.
Dave
I was just trying a few things out - just to see if there was a memory issue. The CPU was very low.
Once we can get to CPU utilisation then play with the pools - but since this is a single user box then runnung from *BASE should not have any realy implications.
Dave
ASKER
Hi,
Not made any difference. The job is now running in BASE which is where it was in the first place. Not sure if i've missed something here as i had auto tuning switched on originally.
Basically, the subsystem IPTS is now running in *BASE with QPFRADJ set to 3.
Please help!
Not made any difference. The job is now running in BASE which is where it was in the first place. Not sure if i've missed something here as i had auto tuning switched on originally.
Basically, the subsystem IPTS is now running in *BASE with QPFRADJ set to 3.
Please help!
ASKER
I think its also something to do with this particular job which is deleting records over about 6 large files.
should i be looking at load balancing on the disks as well?
should i be looking at load balancing on the disks as well?
Hi
I do not think this is an AS/400 performance issue - can we look at the following:
1) if you do a wrksyssts what is the db utilisation,
2) Can you do a dspjob and check if there are any record locks.
3) do a strsrvjob on the job, then a strdbg. Then look at the job log
4) What are the attributes of the program ie RPG, SQLRPG.
Dave
I do not think this is an AS/400 performance issue - can we look at the following:
1) if you do a wrksyssts what is the db utilisation,
2) Can you do a dspjob and check if there are any record locks.
3) do a strsrvjob on the job, then a strdbg. Then look at the job log
4) What are the attributes of the program ie RPG, SQLRPG.
Dave
ASKER
% CPU used . . . . . . . : 15.7 Auxiliary storage:
% DB capability . . . . : 1.8 System ASP . . . . . . : 132.6 G
Elapsed time . . . . . . : 00:00:04 % system ASP used . . : 82.4943
Jobs in system . . . . . : 856 Total . . . . . . . . : 132.6 G
% perm addresses . . . . : .009 Current unprotect used : 1666 M
% temp addresses . . . . : .010 Maximum unprotect . . : 1676 M
Type changes (if allowed), press Enter.
System Pool Reserved Max -----DB----- ---Non-DB---
Pool Size (M) Size (M) Active Fault Pages Fault Pages
1 130.56 52.00 +++++ .0 .0 31.4 32.1
2 110.08 .94 49 7.2 8.1 211.8 822.3
3 12.79 .00 5 .0 .0 10.0 20.9
5 2.55 .00 5 .0 .0 .0 .0
There are member locks but as this is the only job running it should not be a problem.
done the STRSRVJOB but not sure how you want me to do the strdbg?
Thanks for being patient.
% DB capability . . . . : 1.8 System ASP . . . . . . : 132.6 G
Elapsed time . . . . . . : 00:00:04 % system ASP used . . : 82.4943
Jobs in system . . . . . : 856 Total . . . . . . . . : 132.6 G
% perm addresses . . . . : .009 Current unprotect used : 1666 M
% temp addresses . . . . : .010 Maximum unprotect . . : 1676 M
Type changes (if allowed), press Enter.
System Pool Reserved Max -----DB----- ---Non-DB---
Pool Size (M) Size (M) Active Fault Pages Fault Pages
1 130.56 52.00 +++++ .0 .0 31.4 32.1
2 110.08 .94 49 7.2 8.1 211.8 822.3
3 12.79 .00 5 .0 .0 10.0 20.9
5 2.55 .00 5 .0 .0 .0 .0
There are member locks but as this is the only job running it should not be a problem.
done the STRSRVJOB but not sure how you want me to do the strdbg?
Thanks for being patient.
ASKER
unfortunately the CL program calls lots of RPG programs (RPGLE?) so i cant really say what the attributes are.
Hi
just on the interactive session.
This will debug the job in batch and give a lot more detail in the job log.
You can then do a dspjob to check if anything looks strange.
Dave
just on the interactive session.
This will debug the job in batch and give a lot more detail in the job log.
You can then do a dspjob to check if anything looks strange.
Dave
ps we and no debugging a program, but degugging the job.
ASKER
so after running the STRSRVJOB JOB(015501/SW01/IP200921) command i just type strdbg right?
then dspjoblog?
not seeing anything
then dspjoblog?
not seeing anything
Hi
you do a dspjob on the batch job.
dave
you do a dspjob on the batch job.
dave
Minor note... there is no reason to have performance adjuster turned on if *BASE is the memory pool in use.
Also, agreed that as long as this is the only job running, then *BASE is reasonable. But, I'd be surprised if this is the only job, especially if Island Pacific is involved. E.g., the TCP/IP data queue server and/or the distributed program call/remote command host server is also probably running (as well as TCP/IP itself, etc.)
The memory pool for the actual batch job should be a different pool from the various server jobs, and all of them should be out of *BASE -- this implies *BASE plus two shared pools minimum.
We apparently want to know if serving to Island Pacific via the TCP/IP servers is part of the performance issue. There are two basic potential areas to watch: (1) the batch job that seems to be using minimal CPU and (2) the server jobs.
Or maybe I've misunderstood. It's seemed that the basic batch job isn't actually doing much, so it seemed any performance issue had to be somewhere else.
Tom
Also, agreed that as long as this is the only job running, then *BASE is reasonable. But, I'd be surprised if this is the only job, especially if Island Pacific is involved. E.g., the TCP/IP data queue server and/or the distributed program call/remote command host server is also probably running (as well as TCP/IP itself, etc.)
The memory pool for the actual batch job should be a different pool from the various server jobs, and all of them should be out of *BASE -- this implies *BASE plus two shared pools minimum.
We apparently want to know if serving to Island Pacific via the TCP/IP servers is part of the performance issue. There are two basic potential areas to watch: (1) the batch job that seems to be using minimal CPU and (2) the server jobs.
Or maybe I've misunderstood. It's seemed that the basic batch job isn't actually doing much, so it seemed any performance issue had to be somewhere else.
Tom
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Yeah, it's hard to tell what <RUN> means as a status without knowing the source statements. And we only know that that was the "status" at the instant it was collected.
I have a program that talks to the EDRS server, either local or remote. While it's 'waiting' for a response, the status is <RUN> even though I know it's waiting, apparently because I've called one of the Qxda... APIs and that translates to 'running'. No CPU is used during that time.
Hmmm... I ought to try the same with SQL CLI.
Anyway, the status of <RUN> only means that no technical WAIT has been requested, AFAIK.
Since there may be many external CALLs to RPG programs and deletes are going on over large files, I'm not too surprised at the higher non-DB faulting/paging rates. Lots of programs starting/stopping and lots of files opening/closing is gonna equal lots of faulting/paging.
We also have access to WRKSYSACT. But it didn't show any real surprises -- except that there was nothing chewing up CPU. The top task was CFINT01 and it wasn't doing much. Nor were any tasks below it. CPU doesn't seem constrained in the slightest, so working on CPU seems not indicated.
Maybe there's no CPU available that _can_ be used because thrashing is keeping processes from running effectively.
So far, I don't see that we have a solid base from which we can make educated guesses. I'd still say that jobs need to be separated into appropriate pools.
Obviously details such as assigning some memory to the pools is a pretty good idea, heh. Once memory is assigned and jobs are using the pools and performance adjuster has 5-10 minutes to run, it's time to take a snapshot from DSPSYSSTS. (Use <F21=Select assistance level> to set 3=Advanced.) Then wait 10-15 minutes and take a second DSPSYSSTS snapshot.
The first snapshot should be 5-10 minutes after DSPSYSSTS is first displayed so it's had some running time to gather statistics. Get the suspect job running before using DSPSYSSTS.
If DSPSYSSTS is already running, then press <F10=Restart> after the suspect job starts. Then wait the 5-10 minutes for the first snapshot.
What we'll look for between the two snapshots is a trend. Maybe memory will be shifted; if so, where from and where to? Maybe activity levels will be changed. Maybe faulting/paging will change significantly.
Or maybe there will be no bumps in any stats at all.
Tom
I have a program that talks to the EDRS server, either local or remote. While it's 'waiting' for a response, the status is <RUN> even though I know it's waiting, apparently because I've called one of the Qxda... APIs and that translates to 'running'. No CPU is used during that time.
Hmmm... I ought to try the same with SQL CLI.
Anyway, the status of <RUN> only means that no technical WAIT has been requested, AFAIK.
Since there may be many external CALLs to RPG programs and deletes are going on over large files, I'm not too surprised at the higher non-DB faulting/paging rates. Lots of programs starting/stopping and lots of files opening/closing is gonna equal lots of faulting/paging.
We also have access to WRKSYSACT. But it didn't show any real surprises -- except that there was nothing chewing up CPU. The top task was CFINT01 and it wasn't doing much. Nor were any tasks below it. CPU doesn't seem constrained in the slightest, so working on CPU seems not indicated.
Maybe there's no CPU available that _can_ be used because thrashing is keeping processes from running effectively.
So far, I don't see that we have a solid base from which we can make educated guesses. I'd still say that jobs need to be separated into appropriate pools.
Obviously details such as assigning some memory to the pools is a pretty good idea, heh. Once memory is assigned and jobs are using the pools and performance adjuster has 5-10 minutes to run, it's time to take a snapshot from DSPSYSSTS. (Use <F21=Select assistance level> to set 3=Advanced.) Then wait 10-15 minutes and take a second DSPSYSSTS snapshot.
The first snapshot should be 5-10 minutes after DSPSYSSTS is first displayed so it's had some running time to gather statistics. Get the suspect job running before using DSPSYSSTS.
If DSPSYSSTS is already running, then press <F10=Restart> after the suspect job starts. Then wait the 5-10 minutes for the first snapshot.
What we'll look for between the two snapshots is a trend. Maybe memory will be shifted; if so, where from and where to? Maybe activity levels will be changed. Maybe faulting/paging will change significantly.
Or maybe there will be no bumps in any stats at all.
Tom
ASKER
Hi,
sorry got caught up with something else will look at this today.
sorry got caught up with something else will look at this today.
ASKER
Hi,
i've closed the question as i've given up on this machine. as the original Install was problematic and it is my dev box, i'm going to completely wipe it and reinstall the OS.
Hopefully that and the latest IBM Patches will sort out my problem.
i've closed the question as i've given up on this machine. as the original Install was problematic and it is my dev box, i'm going to completely wipe it and reinstall the OS.
Hopefully that and the latest IBM Patches will sort out my problem.
first of all, is the job a batch job or an interactive job?
How many uses do you have?
When running the job do a wrksysact - to check what the system is doing look for a job CFINT01 and check what it is doing.
The *interactive pool looks a bit low
basically the more detail you can give us the more advice you will get back.
dave