Db2 LUW Diag Log reports "Resource temporarily unavailable" in AIX machine

Prardhan N
Prardhan N used Ask the Experts™
on
Hi All

I have noticed a OS error in my diag log and i also notice that HADR is not in sync during this period.

From error message, I am not completely clear which resource is unavailable ?

I suspect that Network is not good during that time.
Will there be chance for any other resources?

How to dig it further?

Its an AIX machine.

Below is the DIAG log piece.


2017-12-23-07.11.34.627255-360 E196685A513        LEVEL: Error (OS)
PID     : 57018777              TID  : 200         PROC : db2sysc 0
INSTANCE: db2inst1             NODE : 000
EDUID   : 258                  EDUNAME: db2sysc 0
FUNCTION: DB2 UDB, oper system services, sqlorqueInternal, probe:9
MESSAGE : ZRC=0x870F0041=-2029060031=SQLO_QUE_NOT_SENT "Message Not Sent"
          DIA8557C No message was sent using the message queue.
CALLED  : OS, -, select
OSERR   : EAGAIN (11) "Resource temporarily unavailable"
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Hi sridhar,

Don't you just love IBM error messages?

Anyway....

The critical portions of the message are these lines:

FUNCTION: DB2 UDB, oper system services, sqlorqueInternal, probe:9
MESSAGE : ZRC=0x870F0041=-2029060031=SQLO_QUE_NOT_SENT "Message Not Sent"
          DIA8557C No message was sent using the message queue.

The first line indicates that DB2 sent a send_message request.  Messages are a normal mechanism for sending data, semaphores, etc.  (The line isn't written to the log for successful O/S calls.)  The second and third lines are the DB2 acknowledgement that an O/S request was made and an error returned.

The resource that is unavailable is the Message Queue.  It's full and cannot be extended due to the limits of the tuning parameter(s).

Are you running 32 or 64 bit AIX?

Kent

Author

Commented:
getconf KERNEL_BITMODE

64
This is the first I've heard of the error occurring on a 64 bit system.  The 32-bit systems could have the message queue fill if the size of the queue exceeded the tuning parameter, but I thought that the 64-bit systems weren't subject to the same limitation.

Here's a link to some IBM documentation that describes some critical tuning parameters for DB2.  MSGMAX (the size of the message queue) should be at least 65K.  Though I suspect that no value is guaranteed large enough if, as you suspect, there is a network issue at the time the error is detected.

  https://www.ibm.com/support/knowledgecenter/en/SSZJPZ_11.5.0/com.ibm.swg.im.iis.productization.iisinfsv.install.doc/topics/wsisinst_kernel_parameters_linux_unix.html


Kent
Ensure you’re charging the right price for your IT

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden using our free interactive tool and use it to determine the right price for your IT services. Start calculating Now!

Author

Commented:
As per the given IBM link, for AIX default kernel values are suffice, and below are my ulimit values

time(seconds)        unlimited
file(blocks)         unlimited
data(kbytes)         unlimited
stack(kbytes)        unlimited
memory(kbytes)       unlimited
coredump(blocks)     unlimited
nofiles(descriptors) unlimited
threads(per process) unlimited
processes(per user)  unlimited

Will that error message comes if there is network fluctuation?
Fractional CTO
Distinguished Expert 2018
Commented:
This is fairly common when you try using default IPC settings... which will never support even the slightest production load.

https://www-304.ibm.com/support/docview.wss?uid=swg21438228 provides the starting point of how to fix this.

The above link only gives an overview of the problem, then provides other links which actually take you through the process of tuning IPC for your runtime environment.

The OSERR   : EAGAIN (11) "Resource temporarily unavailable" message means IPC queue memory or linked list (management memory) has been exhausted. This is why this evil message is intermittent. As soon as memory becomes available, then IPC will work for a while, then start failing again.

Just go through the IBM IPC Tuning directions + you'll be good.

Author

Commented:
Thanks!!! will check it out.
David FavorFractional CTO
Distinguished Expert 2018

Commented:
You're welcome!

Author

Commented:
Thanks for your inputs.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial