asked on

AS400 i520 Memory pool non db page faults - Domino memory pools

Hi

Below is an issue with our i520 V5R3

I'm getting consistently high readings in the non db fault/pages in the pools allocated for our domino servers. 1 being the core system apps, the other being a sametime only server.

System Pool Reserved Max -----DB----- ---Non-DB---
Pool Size (M) Size (M) Active Fault Pages Fault Pages
1 943.73 197.40 +++++ .0 .0 3.0 3.5
2 805.60 5.33 241 .0 .1 .2 .5
3 383.97 .00 10 .0 .0 1.0 1.2
4 76.79 .00 5 .0 .0 .0 .0
5 5250.00 1.05 1000 .7 .9 137.4 528.2
6 219.46 1.24 127 .0 .0 31.0 146.3

Pool 5 (core domino server)
Pool 6 (sametime)

Having read around this subject (forgive my novice status) it seems that this is usually caused by either

1. Not enough memory allocated to the pool, or
2. Too high a max active thread setting

I can't really believe that more memory is required but am very uncertain as to what effect reducing max active threads will do. We have roughly 100 users, and do very often have performance "blips" especially when indexing the core db.

Any suggestions would be welcome, in house as400 expertese is limited!

thanks

Colin

Sjef Bosman

It seems that allocating 5G for Domino isn't the right thing to do. Here's a document with some guidelines for memory parametrization:

http://www-10.lotus.com/ldd/sandbox.nsf/af0f1b4673fc2715852570ff006cd3ff/cb533a62695b1f08852571370053fa43?OpenDocument

By the way my AS400 expertise is not limited, it is non-existent...

Barry Harper

Colin,
1) Is the automatic performance adjuster turned on?
Check using Work with System Value command:
WRKSYSVAL QPFRADJ
and report back the value. If it is 2 or 3, the system will attempt to adjust the memory pool size and activity level.

2) What was the elapsed time for the system status stats? Are they typical, or just during a performance problem?
A sample with a 2 minute and a 60 minute interval would be interesting to look at.
Press F11 to see if pool 5 and 6 were set up as shared pools (*SHRPOOLn). If so, tuning can be tweaked by using the Work with Shared Pools (WRKSHRPOOL) command

Barry

Barry Harper

Also, can you go into the Domino console and use the show stat command.
What is the value for Database.Database.BufferPool.PercentReadsInBuffer?
This number shows what per cent of reads were in the NSF buffer, i.e. not waiting on disk. The guideline is 95+%

Barry

ASKER CERTIFIED SOLUTION

Member_2_276102

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

daviesgroup

ASKER

thanks for the pointers gents, I'll get some stats ASAP. I actually had looked into 5 minute elapsed stats after posting the original question, sincere apologies for the lack of info!

daviesgroup

ASKER

Here are the stats as requested:

System Pool Reserved Max
Pool Size (M) Size (M) Active Pool Subsystem Library
1 873.41 197.48 +++++ *MACHINE
2 953.55 6.57 241 *BASE
3 383.97 .00 10 *INTERACT
4 76.79 .00 5 *SPOOL
5 5250.00 1.05 1000 6 DOMINO04 QUSRNOTES
6 141.83 1.24 136 *SHRPOOL1

System Pool Reserved Max Paging
Pool Size (M) Size (M) Active Option
1 873.41 197.48 +++++ *FIXED
2 953.55 6.57 241 *CALC
3 383.97 .00 10 *CALC
4 76.79 .00 5 *CALC
5 5250.00 1.05 1000 *FIXED
6 141.83 1.24 136 *FIXED

subsystems:

Total -----------Subsystem Pools------------
Opt Subsystem Storage (M) 1 2 3 4 5 6 7 8 9 10
DOMINO02 .00 6
DOMINO04 5250.00 2 5 (core domino applications)
QASE5 .00 2
QBATCH .00 2
QCMN .00 2
QCTL .00 2
QHTTPSVR .00 2
QINTER .00 2 3
QSERVER .00 2
QSPL .00 2 4
QSYSWRK .00 2
QUSRWRK .00 2

2minutes:

System Pool Reserved Max -----DB----- ---Non-DB---
Pool Size (M) Size (M) Active Fault Pages Fault Pages
1 962.93 197.47 +++++ .0 .0 9.2 10.1
2 871.12 6.39 241 .0 .0 .1 .5
3 383.97 .00 10 .0 .0 .1 .2
4 76.79 .00 5 .0 .0 .0 .0
5 5250.00 1.06 1000 .0 .2 123.1 363.5
6 134.73 1.24 130 .0 .0 26.2 87.5

45 minutes:

System Pool Reserved Max -----DB----- ---Non-DB---
Pool Size (M) Size (M) Active Fault Pages Fault Pages
1 1058.66 197.47 +++++ .0 .0 5.3 5.8
2 767.95 5.33 241 .0 .0 .3 .8
3 383.97 .00 10 .0 .0 .1 .4
4 76.79 .00 5 .0 .0 .0 .0
5 5250.00 1.05 1000 .7 1.0 127.6 401.8

Domino Sh stat:

Database.Database.BufferPool.PerCentReadsInBuffer = 99.83

More questions:

1. Are you suggesting that we put everything in to 3 shared pools to monitor the tasks better?
2. How do you work with a defined memory pool/what is the command?
3. If we make a change to the pools, etc, are they immediate, i.e, does it require an IPL?

Many thanks again.

SOLUTION

Barry Harper

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

daviesgroup

ASKER

Hi Barry

thanks for your reply

The value of QPFRADJ is 3.

I believe that pool five was "taken off" automatic adjustment as memory was being reallcated to the sametime pool and causing performance to suffer.

Our main concern was if the high amount of memory page/faults was an issue.

thanks

Colin

Member_2_276102

Colin:

> Are you suggesting that we put everything in to 3 shared pools to monitor the tasks better?

No, I was suggesting adding three new shared pools that would allow management of (1) TCP/IP and its server apps such as ftp, telnet, etc., (2) host servers such as the database server (iSeries Access servers), and (3) any of your company's batch jobs such as what might be submitted to QBATCH *jobq.

By routing each of those into separated pools, you both remove clutter from *BASE memory usage and gain visibility into performance elements of those areas. As it is, there's no chance of seeing how well (or poorly) various system tasks are cooperating with others.

Also, once active work is removed from *BASE, you can use the amount of memory that remains free in *BASE to judge whether not your system is in fact memory constrained. *BASE is intended as a holding area for memory that has been surrendered by active work. It is the pool from which subsystems may request memory. If you route your workloads into non-*BASE pools and run performance adjustments, and *BASE regularly has little or no memory in it, you have an immediate indication of a memory constraint. OTOH, if there's regularly excess memory in *BASE (when no work is running in *BASE and performance adjusts are running), you know that performance issues have no memory constraints. It's as simple a way to make that determination as it gets.

Further, with separated, when a memory issue surfaces, you then have visibility into what kind of work is suffering. I.e., if your host database server jobs are in shared pool X and are showing excess paging/faulting, you have a clear clue where to start looking and what to look for.

> How do you work with a defined memory pool/what is the command?

I think of three kinds of pools: System, private and shared.

System pools are *MACHINE and *BASE. Although *BASE is something of a "shared" pool, it has characteristics that show it to be a type of shared pool. Also, *INTERACT and *SPL should probably be in the 'system' group even though they _seem_ to be just IBM names for share pools #61 & #62. *BASE might be considered shared pool #63, while *MACHINE would be #0. I think the architecture covers 64 shared pools which might number 0-63 internally.

*BASE, *INTERACT and *SPL can be associated with any subsystems you want in the same way that normal shared pools can be.

Private pools are simply specific amounts of memory that specify should be dedicated to a subsystem. You might assign 100MB to one subsystem and 500MB to another. They're "private" because they have no name. With no name, there's no mechanism for referring to them outside of the subsystem. The memory is then "private" to the associated subsystem and cannot be referenced elsewhere. Even the performance adjuster can't see that memory, so it remains locked into the associated subsystem whether it's used or not. And if it isn't enough, too bad. The only adjustment is through manual change or by putting the commands into your own program that then manages it.

Ideally, a private pool would be created to hold stuff that you wanted pinned in physical memory. You might have a file that every process accesses and needs to access quickly. Usually most of such a file is likely to be in memory anyway, but this gives a way to guarantee it. Private pools kind of imply that you know why and how such allocated memory should be used.

A shared pool is a predefined memory "container" with a name. There are a couple reasons to use shared pools -- they're the pools that performance adjust can manipulate by increasing/decreasing the size of the container and they can be used for multiple subsystems (or multiple times in a single subsystem though not much reason to do so) concurrently. By reusing shared pools, different subsystems that handle the same kind of work can have a single, shared point of management.

The two main System pool sizes are managed through system values QMCHPOOL and QBASPOOL. It's technically possible to "change" the size of *BASE through the WRKSHRPOOL command, but that's only because reducing another shared pool always releases its memory into *BASE and increasing another shared pool always obtains memory from *BASE. Note that system value QBASPOOL restricts the _minimum_ size of *BASE. You can't increase another shared pool if it would reduce *BASE below its minimum in QBASPOOL; only as much as is available will be reassigned.

While it might be useful to review values that are currently set for QMCHPOOL and QBASPOOL, you won't want to change them until later, if ever. These also work with QMAXACTLVL and QBASACTLVL to adjust the overal system and *BASE "activity levels". Activity levels can also wait for later.

Other shared pool sizes are "maintained" with the WRKSHRPOOL command or more directly with CHGSHRPOOL. CHGSHRPOOL might be used in a program that does some predetermined memory shifting to prepare for work that's going to start or to do your own management if you don't use performance adjust.

Private pools are maintained directly by changing the associated subsystem description itself. You simple execute CHGSBSD and specify values for the POOLS() parameter. You supply the amount of memory you want in the pool identifier that you supply. An identifier is one of the numbers 1-10 to tell which of the 10 possible subsystem memory pools you're working with. The amount of memory is supplied as an actual amount of memory, in which case it's a private pool, or as the name of one of the shared pools, e.g., *SHRPOOL1, *INTERACT, *BASE, etc. Because the amount of memory for a shared pool is externally maintained, you can only supply the name of the pool here.

Your comment about pool 5 being set as a private pool means that performance adjust is probably doing almost as much harm as good. Since most memory is marked as off-limits and all servers, etc., are running in *BASE, it probably accomplishes very little while still incurring the overhead of adjustments. Hard to tell if it's better or worse to leave PfrAdj turned on.

> If we make a change to the pools, etc, are they immediate, i.e, does it require an IPL?

For your purposes, it's immediate. Not exactly true, but you don't need to end/start subsystems to make these changes.

Tom

Member_2_276102

Colin:

The previous comment was about pools themselves. This comment is about getting actual work to run in particular pools.

Here's a trivial example --

Assume the QSERVER subsystem is the subsystem being handled. Assume that *SHRPOOL2 & *SHRPOOL3 have been given some memory via WRKSHRPOOL.

1. CHGSBSD QSYS/QSERVER POOLS(( 1 *BASE ) ( 2 *SHRPOOL2 ) ( 3 *SHRPOOL3 ))

That lets QSERVER use *SHRPOOL2 as one of its memory pool. *SHRPOOL2 can now be referred to as subsystem pool 2 because it's the second one for this subsystem. NOTE: The _subsystem_ pool identifier is "2". That has _nothing_ to do with how *SHRPOOL2 is listed as a _system_ pool. In your case, it's reasonable that *SHRPOOL2 might be found as SystemPool #7 because you have 6 other pools in use. However, it is SubsystemPool #2 when it's mentioned anywhere inside of the subsystem.

*SHRPOOL3 is also made available. Maybe you'll never assign work to it, but it doesn't hurt to tell QSERVER about it and get a local PoolId assigned.

2. CHGRTGE QSYS/QSERVER SEQNBR( 200 ) POOLID( 2 )

When you list subsystems as you did above with WRKSBS, you can type option 5 next to QSERVER to see a menu of choices. Option 7 on that menu lists the "routing entries" for QSERVER. If you list them, you'll probably see that one of the first ones is sequence number 200 and it names program QSYS/QPWFSERVSD. If you display that entry, you can look down its attributes to see that it routes work into _subsystem_ pool #1 -- that's shown as 'Pool identifier'. That's what tells what subsystem pool that this kind of work should run in.

By default, the only pool you have today for QSERVER is *BASE. If you ran the command in step 1, then this command can validly reference PoolId 2. Once done, any work coming into QSERVER that gets routed by this entry will start running in *SHRPOOL2.

Then, repeat the command for each routing entry, changing them all to POOLID( 2 ).

3. CHGPJE QSYS/QSERVER PGM( QSYS/QPWFSERVSO ) POOLID( 2 )

Option 10 on the WRKSBS menu lists the Prestart job entries. One of the first on the list should name program QSYS/QPWFSERVSO. If you display its details, you should see it's associated with Pool identifier 1, much like the routing entries were.

The command in step 3 tells the named prestarts to start in _subsystem_ pool #2, which is *SHRPOOL2 according to the sample commands so far. The next time one of those prestarts starts, it'll start in *SHRPOOL2.

Repeat step 3 for each prestart entry in QSERVER.

When you reach this point, QSERVER is done. But there are other subsystems and a couple of them have a ship-load of routing and/or prestart entries.

The sheer number is enough to convince many (most?) not to attempt actual performance configuration. I can think of two general approaches other than manual.

1. Write a program(proc) that accepts a subsystem name and one, two or even three share pool names, and that does everything automatically for that subsystem. The proc can retrieve all routing and prestart entries and zip through them quickly. The proc can be called for every subsystem or only specific ones.

2. CHGSBSD QSYS/QSERVER POOLS(( 1 *SHRPOOL2 ) ( 2 *SHRPOOL3 ) ( 3 *BASE ) )

Notice that this variation supplies the same three shared pools, but it assigns different _subsystem_ pool numbers to them. Subsystem pool # 1 used to be *Base; but now it's *SHRPOOL2. The existing routing and prestart entries don't have to be changed to route into *SHRPOOL2; they'll route there because that's where PoolId 1 is associated. There's no major need for all of the CHGRTGE and CHGPJE commands.

By thinking about how this single command replaces the series of commands mentioned above, you should get a decent picture of how it all fits together.

While this is technically possible, it needs care -- especially on older systems, but even on newer. Somewhere a while back, IBM changed how "subsystem monitor" jobs ran. A "subsystem monitor" is the actual operating system function that runs and controls what all goes on within a subsystem. Each subsystem has one. This is the job that shows up for the subsystem itself on a WRKACTJOB display. These jobs have a Type of "SBS". If you display a SBS job and look at its Job Status Attributes, you should see a blank for the associated Subsystem and Subsystem pool ID.

It _used_ to be that a SBS job would run either in _subsystem_ pool #1 or in *BASE if there was no pool #1 associated. Nowadays, documentation states that they run in the "first subsystem pool". And that's about as clear as it gets.

The _implication_ is that the associated SBS job would run in *SHRPOOL2 if we simply associated that with _subsystem_ pool #1, and even if we simply didn't assign anything to #1 but set #2 to be *SHRPOOL2. The touchy aspect of that is that this is the kind of job that we don't want to be competing for memory with in our working pools. We'd be perfectly happy with it running in *BASE, away from our normal work.

So, while I believe you _can_ run the shortcut command above, I've never done it and can't predict how it might make any difference.

Also, it _probably_ should be run when the subsystem has ended. The amount of system work that might be initiated by suddenly switching _every_ job including SBS (which probably would ignore it as long as it was active anyway) might introduce significant delays. Active jobs are often the ones requesting memory; I'm not at all clear what memory would go where for a job that had the whole environment switched while active. I prefer the extended approach and strongly recommend it.

And that's about all I can think of for covering the fundamentals.

Tom

daviesgroup

ASKER

thanks again for the advices, we're continuting to analyse the issues and will update as and when

Computer101

Forced accept.

Computer101
EE Admin