Link to home
Start Free TrialLog in
Avatar of joe_echavarria
joe_echavarriaFlag for Dominican Republic

asked on

DB2 error.

Hi,

  Well, this is an Oracle DBA helping an application user that uses DB2.
 
  I am having the below error register in the event viewer of the database server.

  Also when i execute the command    db2pd -agents   i see a lot of "Inst-Active" and just a few Pooled.

 Attached is a file with the output of the db2pd -agents > agents.txt

 Below event viewer error.

Event Type:      Error
Event Source:      DB2
Event Category:      None
Event ID:      4
Date:            04/26/2016
Time:            1:17:38 AM
User:            N/A
Computer:      DOHADB2001
Description:

2016-04-26-01.17.38.898001   Instance:DB2   Node:000
PID:1252(db2syscs.exe)   TID:1612   Appid:none
common communication  sqlcctcpconnmgr_child Probe:125

ADM7009E  An error was encountered in the "TCPIP" protocol support.  A possible
cause is that the maximum number of agents has been exceeded.




Any tips or help.

Regards,
agents.txt
Avatar of joe_echavarria
joe_echavarria
Flag of Dominican Republic image

ASKER

When you take a look to the agents.txt file you will see that there are many connections from the same user.
Avatar of Kent Olsen
Hi Joe,

That's a very old DB2 issue.  What version of DB2 are you using?

Here's a link to an IBM page that describes the error and how to work around it.  The process will vary a bit depending on your version of DB2.

  http://www-01.ibm.com/support/docview.wss?uid=swg21202489



Kent
I executed the below command.

What  should i do ?

Is a production environment, i do not know if the commands in the article will cause problems.

Is an Oracle DBA with no DB2 experience.

db2 "force application (7)"  ?? - no problem executing this one ?

C:\Documents and Settings\db2admin>db2pd -v

Instance DB2 uses 32 bits and DB2 code release SQL08025
with level identifier 03060106
Informational tokens are DB2 v8.1.12.99, s060429, WR21368, FixPak 12.
Be careful when running "Force Application".  It kills the task identified by the parameter.  "Force Application (7)" kills task 7 even if tasks 7 isn't the one you want to kill.

db2pd -agents

Database Partition 0 -- Active -- Up 0 days 00:00:32

Agents:
Current agents: 1
Idle agents: 0
Active agents: 1
Coordinator agents: 1

Address AppHandl [nod-index] AgentPid Priority Type State ClientPid Userid ClientNm Rowsread Rowswrtn LkTmOt DBName
0x0780000000125A20 7 [000-00007] 2576432 0 Coord Active 1288 TESTER db2bp.ex 0 0 0 n/a

The columns don't line up well in the example, but the second column is the AppHandle.  The handle in the example is 7.  You'll need to run the db2pd command and identify which Application Handles to kill.

Can you run the command and post the output here?
I executed the below command.

db2pd -agents > agents_04_26_2016.txt

The output is in the attached file
agents_04_26_2016.txt
It's impossible for me to know how your application interacts with the database.  However, typical usage suggests that the application isn't properly disconnecting from the database when it's done.

Look under the Userid column for duplicates.  (357883, 977141, 984717, etc.)  Note the each of the agents reads the same number of rows per UserId and updates no rows.  Every one of those rows represents a task that is a candidate for deletion (kill).

What's the host operating system?
The application is an old version of  http://www.poms.com/

Is a Windows 2000 .

For example Userid 357883 appears 8 times.     So it means that all are inactive sessions   ?   - So can i kill them all ?

How to make that the application properly disconnect from the database ?
I think you can kill them all.  It may interrupt a user session, but since the task has not updated the database there is no risk to DB2.

  for /F "tokens=2,9,12" %a in (agents_04_26_2016.txt) do ( if [%b]==[357883] if [%c]==[0] echo db2 "force application (%a)" )

Run that in the same folder that you placed the output from db2pd.  (you might want to refresh the file, too.)

The output will be:

db2 "force application (1079)"
db2 "force application (1530)"
db2 "force application (1757)"
db2 "force application (1776)"

You copy/paste that into a command prompt from the DB2 clp window.

Then change the userid in the statement and run it again.  Repeat as necessary.

Kent
Like that i can kill sessions that i do not want., sessions that are active.  I have tested that that command will also present sessions that are active.

Is there any setting in DB2  some kind of "Time Out" parameter that sessions are automatically killed after "X" Minutes ?
I'm not aware of any such parameter in DB2, particularly a version that old.  There is a network setting in the Windows Registry, but that affects a lot more than just these agents.  I'd be hard pressed to change the Windows timeout without the Domain Admin and the Network guys being involved in the decision.

All of the session in the report show as "Inst-Active".  The short script that I offered only selects rows where the write-count is zero to minimize the risk of killing something that was or is updating the database.
Hi Joe,

It just occurred to me that you might want to increase the number of agents.  That has other potential performance impacts, but it might get you past this until a DB2 DBA looks at it.

  db2 get dbm cfg

What are the values of "Agent pool size" and "Max number of coordinating agents"?
Hi,

  There is no DB2 DBA around here... Just me an Oracle DBA helping them.

  Below the ouput of the command.


C:\Documents and Settings\db2admin>db2 get dbm cfg

          Database Manager Configuration

     Node type = Database Server with local and remote clients

 Database manager configuration release level            = 0x0a00

 Maximum total of files open               (MAXTOTFILOP) = 16000
 CPU speed (millisec/instruction)             (CPUSPEED) = 3.581944e-007

 Max number of concurrently active databases     (NUMDB) = 8
 Data Links support                          (DATALINKS) = NO
 Federated Database System Support           (FEDERATED) = NO
 Transaction processor monitor name        (TP_MON_NAME) =

 Default charge-back account           (DFT_ACCOUNT_STR) =

 Java Development Kit installation path       (JDK_PATH) = C:\IBM\SQLLIB\java\jd
k

 Diagnostic error capture level              (DIAGLEVEL) = 3
 Notify Level                              (NOTIFYLEVEL) = 3
 Diagnostic data directory path               (DIAGPATH) =

 Default database monitor switches
   Buffer pool                         (DFT_MON_BUFPOOL) = OFF
   Lock                                   (DFT_MON_LOCK) = OFF
   Sort                                   (DFT_MON_SORT) = OFF
   Statement                              (DFT_MON_STMT) = OFF
   Table                                 (DFT_MON_TABLE) = OFF
   Timestamp                         (DFT_MON_TIMESTAMP) = ON
   Unit of work                            (DFT_MON_UOW) = OFF
 Monitor health of instance and databases   (HEALTH_MON) = OFF

 SYSADM group name                        (SYSADM_GROUP) =
 SYSCTRL group name                      (SYSCTRL_GROUP) =
 SYSMAINT group name                    (SYSMAINT_GROUP) =
 SYSMON group name                        (SYSMON_GROUP) =

 Client Userid-Password Plugin          (CLNT_PW_PLUGIN) =
 Client Kerberos Plugin                (CLNT_KRB_PLUGIN) = IBMkrb5
 Group Plugin                             (GROUP_PLUGIN) =
 GSS Plugin for Local Authorization    (LOCAL_GSSPLUGIN) =
 Server Plugin Mode                    (SRV_PLUGIN_MODE) = UNFENCED
 Server List of GSS Plugins      (SRVCON_GSSPLUGIN_LIST) =
 Server Userid-Password Plugin        (SRVCON_PW_PLUGIN) =
 Server Connection Authentication          (SRVCON_AUTH) = NOT_SPECIFIED
 Database manager authentication        (AUTHENTICATION) = SERVER
 Cataloging allowed without authority   (CATALOG_NOAUTH) = NO
 Trust all clients                      (TRUST_ALLCLNTS) = YES
 Trusted client authentication          (TRUST_CLNTAUTH) = CLIENT
 Bypass federated authentication            (FED_NOAUTH) = NO

 Default database path                       (DFTDBPATH) = C:

 Database monitor heap size (4KB)          (MON_HEAP_SZ) = 66
 Java Virtual Machine heap size (4KB)     (JAVA_HEAP_SZ) = 512
 Audit buffer size (4KB)                  (AUDIT_BUF_SZ) = 0
 Size of instance shared memory (4KB)  (INSTANCE_MEMORY) = AUTOMATIC
 Backup buffer default size (4KB)            (BACKBUFSZ) = 1024
 Restore buffer default size (4KB)           (RESTBUFSZ) = 1024

 Agent stack size                       (AGENT_STACK_SZ) = 16
 Minimum committed private memory (4KB)   (MIN_PRIV_MEM) = 32
 Private memory threshold (4KB)        (PRIV_MEM_THRESH) = 32767

 Sort heap threshold (4KB)                  (SHEAPTHRES) = 29811

 Directory cache support                     (DIR_CACHE) = YES

 Application support layer heap size (4KB)   (ASLHEAPSZ) = 15
 Max requester I/O block size (bytes)         (RQRIOBLK) = 32767
 DOS requester I/O block size (bytes)     (DOS_RQRIOBLK) = 4096
 Query heap size (4KB)                   (QUERY_HEAP_SZ) = 1000

 Workload impact by throttled utilities(UTIL_IMPACT_LIM) = 100

 Priority of agents                           (AGENTPRI) = SYSTEM
 Max number of existing agents               (MAXAGENTS) = 800
 Agent pool size                        (NUM_POOLAGENTS) = 84
 Initial number of agents in pool       (NUM_INITAGENTS) = 0
 Max number of coordinating agents     (MAX_COORDAGENTS) = MAXAGENTS
 Max no. of concurrent coordinating agents  (MAXCAGENTS) = MAX_COORDAGENTS
 Max number of client connections      (MAX_CONNECTIONS) = MAX_COORDAGENTS

 Keep fenced process                        (KEEPFENCED) = YES
 Number of pooled fenced processes         (FENCED_POOL) = MAX_COORDAGENTS
 Initial number of fenced processes     (NUM_INITFENCED) = 0

 Index re-creation time and redo index build  (INDEXREC) = ACCESS

 Transaction manager database name         (TM_DATABASE) = 1ST_CONN
 Transaction resync interval (sec)     (RESYNC_INTERVAL) = 180

 SPM name                                     (SPM_NAME) = DOHADB20
 SPM log size                          (SPM_LOG_FILE_SZ) = 256
 SPM resync agent limit                 (SPM_MAX_RESYNC) = 20
 SPM log path                             (SPM_LOG_PATH) =

 NetBIOS Workstation name                        (NNAME) =

 TCP/IP Service name                          (SVCENAME) = db2c_DB2
 Discovery mode                               (DISCOVER) = SEARCH
 Discover server instance                (DISCOVER_INST) = ENABLE

 Maximum query degree of parallelism   (MAX_QUERYDEGREE) = 2
 Enable intra-partition parallelism     (INTRA_PARALLEL) = NO

 No. of int. communication buffers(4KB)(FCM_NUM_BUFFERS) = 1024
 Number of FCM request blocks              (FCM_NUM_RQB) = 512
 Number of FCM connection entries      (FCM_NUM_CONNECT) = (FCM_NUM_RQB * 0.75)
 Number of FCM message anchors         (FCM_NUM_ANCHORS) = (FCM_NUM_RQB * 0.75)



C:\Documents and Settings\db2admin>
Ah, yes.  The wording has certainly change over the years.  Thanks for posting the entire output.

The line:

  Max number of existing agents               (MAXAGENTS) = 800

shows the limit that you're bumping into.  I don't think that the number of agents is the real problem, but a temporary fix would be to bump that number up a bit.  Not knowing your server configuration or load, I suggest a small bump for now.

  DB2 update db cfg using MAXAGENTS 850 IMMEDIATE

That will increase the number of simultaneous agents by 50.  I don't remember if DB2 will have to be stopped and started for it to take effect though.  You might want to make the change now and if the number of agents doesn't increase, bounce DB2 during after-business hours.

Kent
But there is no way  that DB2 with some sort of parameter or configuration, automatically kills sessions or connections not active after some amount of not activity ?
Can you please take a look to this http://db2commerce.com/2012/02/01/how-to-use-the-db2-governor-to-force-off-idle-connections/

And please tell me how to adapt it to my situation if possible.
Hi Joe,

You can't get DB2 to automatically kill those sessions, but as you've discovered db2governor can.  I've never used it and only peripherally aware that it even exists.

But setting it up for your use should be quick and painless.

Change "sample" to your database name and save the text below as db2gov.{dbname}.cfg

I've got is set to run every 3 minutes and kill tasks idle for more than an hour, but it's easy to change that.

{ Wake up once every three minutes, database name is sample }
interval 180; dbname sample;

desc "Force off ops applications idle for more than 1 hour"
applname ops,ops.exe
setlimit idle 3600
action force;

Open in new window


Then start the governor

  db2gov start {dbname} dbpartitionnum 0 /fully/qualified/path/db2gov.(dbname}.cfg db2gov.sample.log

That should do it.


Kent
I am getting the below error.

C:\Documents and Settings\db2admin>db2gov start dohadb2001 dbpartitionnum 0 db2g
ov.dohadb2001.cfg db2gov.dohadb2001.log

db2govd: GOV1001N  Database name "dohadb200C:\IBM\SQLL " is too long.
Post the configuration file, please.
{ Wake up once every three minutes, database name is sample }
interval 180; dbname pw_data;

desc "Force off ops applications idle for more than 1 hour"
applname ops,ops.exe
setlimit idle 3600
action force;
It looks like there's a mismatch in the configuration file and the db2gov command line parameters.

And let's drop the partition parameter, at least for now.  :)

The command line identifies the database as "dohadb2001", but the configuration file uses "pw_data".  Assuming it's dohadb2001, the command line should be:

db2gov start dohadb2001 db2gov.dohadb2001.cfg db2gov.dohadb2001.log

The configuration file should be:

{ Wake up once every three minutes, database name is sample }
interval 180; dbname dohadb2001 ;

desc "Force off ops applications idle for more than 1 hour"
applname ops,ops.exe
setlimit idle 3600
action force;

Open in new window


Try those settings.  They should be pretty close.
My mistake.  I was using a wrong value for the database name.  Below the command.

How do i know if it did actually something ?

It those not create the file db2gov.pw_data.log

C:\Documents and Settings\db2admin>db2gov start pw_data  dbpartitionnum 0 db2gov
.pw_data.cfg db2gov.pw_data.log

C:\Documents and Settings\db2admin>
Ahha.  Very nice.

You should be able to see the db2gov in the Windows task manager.  I'm not sure what username the task uses, so you may want to show tasks for all users.
Look now.

C:\Documents and Settings\db2admin>db2gov start pw_data  db2gov.pw_data.cfg db2g
ov.pw_data.log

db2govd: GOV1007N  Governor already flagged as running. If it is not running, us
e 'db2gov stop' to clean up.


C:\Documents and Settings\db2admin>db2gov stop pw_data

C:\Documents and Settings\db2admin>db2gov start pw_data  db2gov.pw_data.cfg db2g
ov.pw_data.log

db2govd: GOV1014N  Unable to open log file "C:\IBM\SQLLIB\DB2\log\db2gov.pw_data
.log". RC = "-2029060074".


C:\Documents and Settings\db2admin>


Below that is inside of the db2pov.pw_data.log file

2016-04-27-11.33.29    0 START      Database = PW_DATA

2016-04-27-11.33.29    0 READCFG    Config = C:\Documents and Settings\db2admin\db2gov.pw_data.cfg
RC = "-2029060074".  That's a file sharing violation.  It suggests that db2gov is already running.

You can query the log file with db2govlg

  db2govlg db2gov.pw_data.log
So it means it is running now ?

How would i know if it is really working ?  ,

And if it has killed any session already ? ,
 
Or which session it has killed ?

When i check on agents with db2pd -agents > agents05.txt  it still shows the same USER_ID many times.


C:\Documents and Settings\db2admin>db2govlg db2gov.pw_data.log
2016-04-27-11.33.29    0 START      Database = PW_DATA

2016-04-27-11.33.29    0 READCFG    Config = C:\Documents and Settings\db2admin\
db2gov.pw_data.cfg
Try running db2govlg again.  

The governor has been running for an hour and should have done some cleanup.
This is the output.

C:\Documents and Settings\db2admin>db2govlg db2gov.pw_data.log
2016-04-27-11.33.29    0 START      Database = PW_DATA

2016-04-27-11.33.29    0 READCFG    Config = C:\Documents and Settings\db2admin\
db2gov.pw_data.cfg
Hmmm....  We don't see any indication that tasks have been killed.  Does the agent list look any different than it did this morning?
The agent list looks kind of the same.
That suggests that the OPS.EXE application has some kind of regular communication with DB2.  It might even be a keep-alive handshake -- it's impossible to know without knowing more about the application.

And the user IDs having multiple agents with the same I/O signature makes me believe that the application actually opens multiple connections.

I think that the next step is to see if the application can be tuned to adjust the keep-alive time (or the maximum number of connections it will create), or to add agent threads as I described above.


Kent
Which column in the agent list shows or indicates that the session is idle ?

Address    AppHandl [nod-index] AgentTid   Priority   Type     State       ClientPid  Userid   ClientNm Rowsread   Rowswrtn   LkTmOt DBName  


The State column shows "Inst-Active" value and for Rowsread   Rowswrtn columns both can have values of "0" or one with a value and the other value "0".    Please check the file data.

Can we kill all the session ? - can we say all the session that appears there are idle state ?
We can always kill sessions.  :)

But with that there is always some risk of corrupting the application.  If a single execution of the application has multiple connections to the database and one of them is killed, the application may not recover gracefully.  In that case, I don't know what will happen.  

It's likely that the only effect will be that the application exits.  There's a chance that the open connections don't clean up (drop) properly, but that's a risk we might need to take.

I suggest that we start by killing the sessions for a single user that have done nothing against the database.  They are indicated by rowsread and rowswritten having a value of 0.  Ideally, you would know who that user is and can monitor them as they try to continue working.


Kent
Thanks Kent for your support & time.
Glad to help.  Is the problem better?
No, the error still appears in the event viewer.

I am waiting for the application vendor support.
ASKER CERTIFIED SOLUTION
Avatar of Kent Olsen
Kent Olsen
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial