?
Solved

AS400 i520 Memory pool non db page faults -  Domino memory pools

Posted on 2007-07-30
13
Medium Priority
?
2,062 Views
Last Modified: 2013-12-06
Hi

Below is an issue with our i520 V5R3

I'm getting consistently high readings in the non db fault/pages in the pools allocated for our domino servers. 1 being the core system apps, the other being a sametime only server.

System    Pool    Reserved    Max   -----DB-----  ---Non-DB---
 Pool   Size (M)  Size (M)  Active  Fault  Pages  Fault  Pages
   1      943.73    197.40   +++++     .0     .0    3.0    3.5
   2      805.60      5.33     241     .0     .1     .2     .5
   3      383.97       .00      10     .0     .0    1.0    1.2
   4       76.79       .00       5     .0     .0     .0     .0
   5     5250.00      1.05    1000     .7     .9  137.4  528.2
   6      219.46      1.24     127     .0     .0   31.0  146.3

Pool 5 (core domino server)
Pool 6 (sametime)

Having read around this subject (forgive my novice status) it seems that this is usually caused by either

1. Not enough memory allocated to the pool, or
2. Too high a max active thread setting

I can't really believe that more memory is required but am very uncertain as to what effect reducing max active threads will do. We have roughly 100 users, and do very often have performance "blips" especially when indexing the core db.

Any suggestions would be welcome, in house as400 expertese is limited!

thanks

Colin

0
Comment
Question by:daviesgroup
  • 4
  • 3
  • 3
  • +2
12 Comments
 
LVL 46

Expert Comment

by:Sjef Bosman
ID: 19593248
It seems that allocating 5G for Domino isn't the right thing to do. Here's a document with some guidelines for memory parametrization:

http://www-10.lotus.com/ldd/sandbox.nsf/af0f1b4673fc2715852570ff006cd3ff/cb533a62695b1f08852571370053fa43?OpenDocument

By the way my AS400 expertise is not limited, it is non-existent...
0
 
LVL 13

Expert Comment

by:_b_h
ID: 19593785
Colin,
1) Is the automatic performance adjuster turned on?
Check using Work with System Value command:
WRKSYSVAL QPFRADJ
and report back the value. If it is 2 or 3, the system will attempt to adjust the memory pool size and activity level.

2) What was the elapsed time for the system status stats? Are they typical, or just during a performance problem?
A sample with a 2 minute and a 60 minute interval would be interesting to look at.
Press F11 to see if pool 5 and 6 were set up as shared pools (*SHRPOOLn). If so, tuning can be tweaked by using the Work with Shared Pools (WRKSHRPOOL) command

Barry
0
 
LVL 13

Expert Comment

by:_b_h
ID: 19594504
Also, can you go into the Domino console and use the show stat command.
What is the value for Database.Database.BufferPool.PercentReadsInBuffer?
This number shows what per cent of reads were in the NSF buffer, i.e. not waiting on disk. The guideline is 95+%

Barry
0
Learn to develop an Android App

Want to increase your earning potential in 2018? Pad your resume with app building experience. Learn how with this hands-on course.

 
LVL 27

Accepted Solution

by:
tliotta earned 1000 total points
ID: 19595959
Colin:

As Barry alluded to, there isn't enough info to make a guess. We don't know anything about what is being reported -- in particular for faults shown, we don't know anything about the time period. It might be for a few random seconds or it might be for an extended period. The random seconds might be at a high or a low paging period.

Beyond that, we know nothing about what else is going on, not even what _should_ be going on. Is this system dedicated to Domino apps or are there other critical functions supported? Overall, the stats shown look very good, almost too good. But that's assuming stats that I might gather myself in a structured way.

We see something by noting that there are 6 system pools. We don't know what the other pools are though they're most likely *MACHINE, *BASE, *INTERACT and *SPOOL. From that, I'd feel confident guessing that every system app (TCP/IP servers, host servers, etc.) are _all_ running in *BASE. They have not been given their own pool(s) to run in.

I.e., if we looked at all non-Domino subsystems, we'd find that every one has *BASE as the only available pool and that every routing entry and every pre-start job goes straight to *BASE. The net result is that *BASE _always_ has many, many jobs running in it. And the result of that is that there is (practically) _never_ any clearly available memory from which the performance adjuster can allocate memory to pools that need it. The performance adjuster _always_ moves memory out of *BASE to the shared pool that requests it and _always_ moves memory into *BASE when a shared pool surrenders it. It _never_ moves memory directly between shared pools.

If there is a large number of jobs running in *BASE, they have a good chance of grabbing any memory pages that might be needed elsewhere. That doesn't stop them from moving; it simply adds an additional layer of (almost) invisible paging. With so many jobs running in *BASE, it becomes extremely difficult to interpret stats.

All of the common system jobs step all over the stats that link to *BASE. There is no way to determine if a Domino task that uses TCP/IP services is getting the system support that it needs. If a system bottle-neck appears in the system jobs, there's no way to separate it out. Even customer batch jobs commonly run in *BASE, thereby competing with system apps which are already interfering with performance adjust and stats interpretation.

IBM ships systems with a default config that has _all_ of the system apps running in *BASE. Few sites ever change that. Most often they simply end up buying more memory and never think that the default config should even be subject to change.

When I investigate performance, my first steps are to ensure that three shared pools are created. The shared pools are defined in the WRKSHRPOOL command. One handles all customer batch jobs. One handles all TCP/IP server jobs. The third handles host server jobs. The three are then associated with the various subsystems with the CHGSBSD command; *BASE is left as the first pool and whichever of the three new pools are assigned as additional pools available to those subsystems. In general, there's no reason _not_ to make any given shared pool available; the actual usage doesn't happen until routing and/or pre-start entries are changed.

Then I change every single routing/pre-start entry in every subsystem that relates to TCP/IP or host servers plus any customer batch entries. That's the big complication because (1) there's a lot of them and (2) it can take time to figure out which system app goes which way and they can change between releases.

Once that's all finished, THAT's when taking stats starts to give meaningful results. "Starts" giving -- it probably still won't be reliable until at least coming back up from restricted state because many active system jobs will continue running where they started. As pre-starts close down and start new instances, and as new jobs route to the new pools, stats trend toward becoming rational. Restricted state (or an IPL or ending/starting various subsystems or whatever technique fits) ensures that all starts fresh.

In summary, performance issues are hard to pin down from a single snap-shot. And performance stats can be unreliable in an environment that's uncontrolled.

Tom
0
 

Author Comment

by:daviesgroup
ID: 19598934
thanks for the pointers gents, I'll get some stats ASAP. I actually had looked into 5 minute elapsed stats after posting the original question, sincere apologies for the lack of info!
0
 

Author Comment

by:daviesgroup
ID: 19599446
Here are the stats as requested:

System    Pool    Reserved    Max                                    
 Pool   Size (M)  Size (M)  Active  Pool        Subsystem   Library  
   1      873.41    197.48   +++++  *MACHINE                        
   2      953.55      6.57     241  *BASE                            
   3      383.97       .00      10  *INTERACT                        
   4       76.79       .00       5  *SPOOL                          
   5     5250.00      1.05    1000   6          DOMINO04    QUSRNOTES
   6      141.83      1.24     136  *SHRPOOL1                      

System    Pool    Reserved    Max   Paging
 Pool   Size (M)  Size (M)  Active  Option
   1      873.41    197.48   +++++  *FIXED
   2      953.55      6.57     241  *CALC
   3      383.97       .00      10  *CALC
   4       76.79       .00       5  *CALC
   5     5250.00      1.05    1000  *FIXED
   6      141.83      1.24     136  *FIXED

subsystems:

                      Total     -----------Subsystem Pools------------
Opt  Subsystem     Storage (M)   1   2   3   4   5   6   7   8   9  10
     DOMINO02              .00   6                                    
     DOMINO04               5250.00   2                   5  (core domino applications)              
     QASE5                 .00   2                                    
     QBATCH                .00   2                                    
     QCMN                  .00   2                                    
     QCTL                  .00   2                                    
     QHTTPSVR              .00   2                                    
     QINTER                .00   2   3                                
     QSERVER               .00   2                                    
     QSPL                  .00   2   4                                
     QSYSWRK               .00   2                                    
     QUSRWRK               .00   2                                    

2minutes:

System    Pool    Reserved    Max   -----DB-----  ---Non-DB---
 Pool   Size (M)  Size (M)  Active  Fault  Pages  Fault  Pages
   1      962.93    197.47   +++++     .0     .0    9.2   10.1
   2      871.12      6.39     241     .0     .0     .1     .5
   3      383.97       .00      10     .0     .0     .1     .2
   4       76.79       .00       5     .0     .0     .0     .0
   5     5250.00      1.06    1000     .0     .2  123.1  363.5
   6      134.73      1.24     130     .0     .0   26.2   87.5

45 minutes:

System    Pool    Reserved    Max   -----DB-----  ---Non-DB---
 Pool   Size (M)  Size (M)  Active  Fault  Pages  Fault  Pages
   1     1058.66    197.47   +++++     .0     .0    5.3    5.8
   2      767.95      5.33     241     .0     .0     .3     .8
   3      383.97       .00      10     .0     .0     .1     .4
   4       76.79       .00       5     .0     .0     .0     .0
   5     5250.00      1.05    1000     .7    1.0  127.6  401.8

Domino Sh stat:

  Database.Database.BufferPool.PerCentReadsInBuffer = 99.83

More questions:

1. Are you suggesting that we put everything in to 3 shared pools to monitor the tasks better?
2. How do you work with a defined memory pool/what is the command?
3. If we make a change to the pools, etc, are they immediate, i.e, does it require an IPL?

Many thanks again.
0
 
LVL 13

Assisted Solution

by:_b_h
_b_h earned 1000 total points
ID: 19602327
Colin,
Please list the QPFRADJ system value using WKRSYSVAL so we can see if the performance adjuster is turned on.

FYI, pool 5 is not managed by the performance adjuster (assuming it is on). We can create a shared pool for it to allow automatic adjustment.

The read hit of 99.83% is very good, so I would expect performance is not affected by the memory available to Domino core apps.

IMHO, the memory is behaving well during the given intervals. I seem to recall about 100 faults per processor as a rough guideline. Is the performance problem intermittent?

Reply:
1) A separate shared pool for pool 5 is good idea if you want automatic adjustment.
2) Subsystems have memory pools defined in them. When a subsystem starts, it pulls memory out of the *BASE pool if available. The automatic adjuster moves memory from/to *BASE to/from other pools. You control that using Work with Shared Pools WRKSHRPOOL command.
3) You can manually move memory around from WRKSYSSTS screens above or by changing the subsystems using CHGSBSD.

Barry
0
 

Author Comment

by:daviesgroup
ID: 19608947
Hi Barry

thanks for your reply

The value of QPFRADJ is 3.

I believe that pool five was "taken off" automatic adjustment as memory was being reallcated to the sametime pool and causing performance to suffer.

Our main concern was if the high amount of memory page/faults was an issue.

thanks

Colin

0
 
LVL 27

Expert Comment

by:tliotta
ID: 19613298
Colin:

> Are you suggesting that we put everything in to 3 shared pools to monitor the tasks better?

No, I was suggesting adding three new shared pools that would allow management of (1) TCP/IP and its server apps such as ftp, telnet, etc., (2) host servers such as the database server (iSeries Access servers), and (3) any of your company's batch jobs such as what might be submitted to QBATCH *jobq.

By routing each of those into separated pools, you both remove clutter from *BASE memory usage and gain visibility into performance elements of those areas. As it is, there's no chance of seeing how well (or poorly) various system tasks are cooperating with others.

Also, once active work is removed from *BASE, you can use the amount of memory that remains free in *BASE to judge whether not your system is in fact memory constrained. *BASE is intended as a holding area for memory that has been surrendered by active work. It is the pool from which subsystems may request memory. If you route your workloads into non-*BASE pools and run performance adjustments, and *BASE regularly has little or no memory in it, you have an immediate indication of a memory constraint. OTOH, if there's regularly excess memory in *BASE (when no work is running in *BASE and performance adjusts are running), you know that performance issues have no memory constraints. It's as simple a way to make that determination as it gets.

Further, with separated, when a memory issue surfaces, you then have visibility into what kind of work is suffering. I.e., if your host database server jobs are in shared pool X and are showing excess paging/faulting, you have a clear clue where to start looking and what to look for.

> How do you work with a defined memory pool/what is the command?

I think of three kinds of pools: System, private and shared.

System pools are *MACHINE and *BASE. Although *BASE is something of a "shared" pool, it has characteristics that show it to be a type of shared pool. Also, *INTERACT and *SPL should probably be in the 'system' group even though they _seem_ to be just IBM names for share pools #61 & #62. *BASE might be considered shared pool #63, while *MACHINE would be #0. I think the architecture covers 64 shared pools which might number 0-63 internally.

*BASE, *INTERACT and *SPL can be associated with any subsystems you want in the same way that normal shared pools can be.

Private pools are simply specific amounts of memory that specify should be dedicated to a subsystem. You might assign 100MB to one subsystem and 500MB to another. They're "private" because they have no name. With no name, there's no mechanism for referring to them outside of the subsystem. The memory is then "private" to the associated subsystem and cannot be referenced elsewhere. Even the performance adjuster can't see that memory, so it remains locked into the associated subsystem whether it's used or not. And if it isn't enough, too bad. The only adjustment is through manual change or by putting the commands into your own program that then manages it.

Ideally, a private pool would be created to hold stuff that you wanted pinned in physical memory. You might have a file that every process accesses and needs to access quickly. Usually most of such a file is likely to be in memory anyway, but this gives a way to guarantee it. Private pools kind of imply that you know why and how such allocated memory should be used.

A shared pool is a predefined memory "container" with a name. There are a couple reasons to use shared pools -- they're the pools that performance adjust can manipulate by increasing/decreasing the size of the container and they can be used for multiple subsystems (or multiple times in a single subsystem though not much reason to do so) concurrently. By reusing shared pools, different subsystems that handle the same kind of work can have a single, shared point of management.

The two main System pool sizes are managed through system values QMCHPOOL and QBASPOOL. It's technically possible to "change" the size of *BASE through the WRKSHRPOOL command, but that's only because reducing another shared pool always releases its memory into *BASE and increasing another shared pool always obtains memory from *BASE. Note that system value QBASPOOL restricts the _minimum_ size of *BASE. You can't increase another shared pool if it would reduce *BASE below its minimum in QBASPOOL; only as much as is available will be reassigned.

While it might be useful to review values that are currently set for QMCHPOOL and QBASPOOL, you won't want to change them until later, if ever. These also work with QMAXACTLVL and QBASACTLVL to adjust the overal system and *BASE "activity levels". Activity levels can also wait for later.

Other shared pool sizes are "maintained" with the WRKSHRPOOL command or more directly with CHGSHRPOOL. CHGSHRPOOL might be used in a program that does some predetermined memory shifting to prepare for work that's going to start or to do your own management if you don't use performance adjust.

Private pools are maintained directly by changing the associated subsystem description itself. You simple execute CHGSBSD and specify values for the POOLS() parameter. You supply the amount of memory you want in the pool identifier that you supply. An identifier is one of the numbers 1-10 to tell which of the 10 possible subsystem memory pools you're working with. The amount of memory is supplied as an actual amount of memory, in which case it's a private pool, or as the name of one of the shared pools, e.g., *SHRPOOL1, *INTERACT, *BASE, etc. Because the amount of memory for a shared pool is externally maintained, you can only supply the name of the pool here.

Your comment about pool 5 being set as a private pool means that performance adjust is probably doing almost as much harm as good. Since most memory is marked as off-limits and all servers, etc., are running in *BASE, it probably accomplishes very little while still incurring the overhead of adjustments. Hard to tell if it's better or worse to leave PfrAdj turned on.

> If we make a change to the pools, etc, are they immediate, i.e, does it require an IPL?

For your purposes, it's immediate. Not exactly true, but you don't need to end/start subsystems to make these changes.

Tom
0
 
LVL 27

Expert Comment

by:tliotta
ID: 19613802
Colin:

The previous comment was about pools themselves. This comment is about getting actual work to run in particular pools.

Here's a trivial example --

Assume the QSERVER subsystem is the subsystem being handled. Assume that *SHRPOOL2 & *SHRPOOL3 have been given some memory via WRKSHRPOOL.

1. CHGSBSD  QSYS/QSERVER POOLS(( 1 *BASE ) ( 2 *SHRPOOL2 ) ( 3 *SHRPOOL3 ))

That lets QSERVER use *SHRPOOL2 as one of its memory pool. *SHRPOOL2 can now be referred to as subsystem pool 2 because it's the second one for this subsystem. NOTE: The _subsystem_ pool identifier is "2". That has _nothing_ to do with how *SHRPOOL2 is listed as a _system_ pool. In your case, it's reasonable that *SHRPOOL2 might be found as SystemPool #7 because you have 6 other pools in use. However, it is SubsystemPool #2 when it's mentioned anywhere inside of the subsystem.

*SHRPOOL3 is also made available. Maybe you'll never assign work to it, but it doesn't hurt to tell QSERVER about it and get a local PoolId assigned.

2. CHGRTGE  QSYS/QSERVER  SEQNBR( 200 ) POOLID( 2 )

When you list subsystems as you did above with WRKSBS, you can type option 5 next to QSERVER to see a menu of choices. Option 7 on that menu lists the "routing entries" for QSERVER. If you list them, you'll probably see that one of the first ones is sequence number 200 and it names program QSYS/QPWFSERVSD. If you display that entry, you can look down its attributes to see that it routes work into _subsystem_ pool #1 -- that's shown as 'Pool identifier'. That's what tells what subsystem pool that this kind of work should run in.

By default, the only pool you have today for QSERVER is *BASE. If you ran the command in step 1, then this command can validly reference PoolId 2. Once done, any work coming into QSERVER that gets routed by this entry will start running in *SHRPOOL2.

Then, repeat the command for each routing entry, changing them all to POOLID( 2 ).

3. CHGPJE  QSYS/QSERVER  PGM( QSYS/QPWFSERVSO ) POOLID( 2 )

Option 10 on the WRKSBS menu lists the Prestart job entries. One of the first on the list should name program QSYS/QPWFSERVSO. If you display its details, you should see it's associated with Pool identifier 1, much like the routing entries were.

The command in step 3 tells the named prestarts to start in _subsystem_ pool #2, which is *SHRPOOL2 according to the sample commands so far. The next time one of those prestarts starts, it'll start in *SHRPOOL2.

Repeat step 3 for each prestart entry in QSERVER.

When you reach this point, QSERVER is done. But there are other subsystems and a couple of them have a ship-load of routing and/or prestart entries.

The sheer number is enough to convince many (most?) not to attempt actual performance configuration. I can think of two general approaches other than manual.

1. Write a program(proc) that accepts a subsystem name and one, two or even three share pool names, and that does everything automatically for that subsystem. The proc can retrieve all routing and prestart entries and zip through them quickly. The proc can be called for every subsystem or only specific ones.

2. CHGSBSD  QSYS/QSERVER POOLS(( 1 *SHRPOOL2 ) ( 2 *SHRPOOL3 ) ( 3 *BASE ) )

Notice that this variation supplies the same three shared pools, but it assigns different _subsystem_ pool numbers to them. Subsystem pool # 1 used to be *Base; but now it's *SHRPOOL2. The existing routing and prestart entries don't have to be changed to route into *SHRPOOL2; they'll route there because that's where PoolId 1 is associated. There's no major need for all of the CHGRTGE and CHGPJE commands.

By thinking about how this single command replaces the series of commands mentioned above, you should get a decent picture of how it all fits together.

While this is technically possible, it needs care -- especially on older systems, but even on newer. Somewhere a while back, IBM changed how "subsystem monitor" jobs ran. A "subsystem monitor" is the actual operating system function that runs and controls what all goes on within a subsystem. Each subsystem has one. This is the job that shows up for the subsystem itself on a WRKACTJOB display. These jobs have a Type of "SBS". If you display a SBS job and look at its Job Status Attributes, you should see a blank for the associated Subsystem and Subsystem pool ID.

It _used_ to be that a SBS job would run either in _subsystem_ pool #1 or in *BASE if there was no pool #1 associated. Nowadays, documentation states that they run in the "first subsystem pool". And that's about as clear as it gets.

The _implication_ is that the associated SBS job would run in *SHRPOOL2 if we simply associated that with _subsystem_ pool #1, and even if we simply didn't assign anything to #1 but set #2 to be *SHRPOOL2. The touchy aspect of that is that this is the kind of job that we don't want to be competing for memory with in our working pools. We'd be perfectly happy with it running in *BASE, away from our normal work.

So, while I believe you _can_ run the shortcut command above, I've never done it and can't predict how it might make any difference.

Also, it _probably_ should be run when the subsystem has ended. The amount of system work that might be initiated by suddenly switching _every_ job including SBS (which probably would ignore it as long as it was active anyway) might introduce significant delays. Active jobs are often the ones requesting memory; I'm not at all clear what memory would go where for a job that had the whole environment switched while active. I prefer the extended approach and strongly recommend it.

And that's about all I can think of for covering the fundamentals.

Tom
0
 

Author Comment

by:daviesgroup
ID: 19638226
thanks again for the advices, we're continuting to analyse the issues and will update as and when

0
 
LVL 1

Expert Comment

by:Computer101
ID: 20106478
Forced accept.

Computer101
EE Admin
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this article we will discuss all things related to StageFright bug, the most vulnerable bug of android devices.
Learn about cloud computing and its benefits for small business owners.
In this video, Percona Director of Solution Engineering Jon Tobin discusses the function and features of Percona Server for MongoDB. How Percona can help Percona can help you determine if Percona Server for MongoDB is the right solution for …
In this video, Percona Solutions Engineer Barrett Chambers discusses some of the basic syntax differences between MySQL and MongoDB. To learn more check out our webinar on MongoDB administration for MySQL DBA: https://www.percona.com/resources/we…

601 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question