Db2 LUW Diag Log reports "Resource temporarily unavailable" in AIX machine

Hi All

I have noticed a OS error in my diag log and i also notice that HADR is not in sync during this period.

From error message, I am not completely clear which resource is unavailable ?

I suspect that Network is not good during that time.
Will there be chance for any other resources?

How to dig it further?

Its an AIX machine.

Below is the DIAG log piece.


2017-12-23-07.11.34.627255-360 E196685A513        LEVEL: Error (OS)
PID     : 57018777              TID  : 200         PROC : db2sysc 0
INSTANCE: db2inst1             NODE : 000
EDUID   : 258                  EDUNAME: db2sysc 0
FUNCTION: DB2 UDB, oper system services, sqlorqueInternal, probe:9
MESSAGE : ZRC=0x870F0041=-2029060031=SQLO_QUE_NOT_SENT "Message Not Sent"
          DIA8557C No message was sent using the message queue.
CALLED  : OS, -, select
OSERR   : EAGAIN (11) "Resource temporarily unavailable"
Prardhan NAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Kent OlsenDBACommented:
Hi sridhar,

Don't you just love IBM error messages?

Anyway....

The critical portions of the message are these lines:

FUNCTION: DB2 UDB, oper system services, sqlorqueInternal, probe:9
MESSAGE : ZRC=0x870F0041=-2029060031=SQLO_QUE_NOT_SENT "Message Not Sent"
          DIA8557C No message was sent using the message queue.

The first line indicates that DB2 sent a send_message request.  Messages are a normal mechanism for sending data, semaphores, etc.  (The line isn't written to the log for successful O/S calls.)  The second and third lines are the DB2 acknowledgement that an O/S request was made and an error returned.

The resource that is unavailable is the Message Queue.  It's full and cannot be extended due to the limits of the tuning parameter(s).

Are you running 32 or 64 bit AIX?

Kent
0
Prardhan NAuthor Commented:
getconf KERNEL_BITMODE

64
0
Kent OlsenDBACommented:
This is the first I've heard of the error occurring on a 64 bit system.  The 32-bit systems could have the message queue fill if the size of the queue exceeded the tuning parameter, but I thought that the 64-bit systems weren't subject to the same limitation.

Here's a link to some IBM documentation that describes some critical tuning parameters for DB2.  MSGMAX (the size of the message queue) should be at least 65K.  Though I suspect that no value is guaranteed large enough if, as you suspect, there is a network issue at the time the error is detected.

  https://www.ibm.com/support/knowledgecenter/en/SSZJPZ_11.5.0/com.ibm.swg.im.iis.productization.iisinfsv.install.doc/topics/wsisinst_kernel_parameters_linux_unix.html


Kent
0
Big Business Goals? Which KPIs Will Help You

The most successful MSPs rely on metrics – known as key performance indicators (KPIs) – for making informed decisions that help their businesses thrive, rather than just survive. This eBook provides an overview of the most important KPIs used by top MSPs.

Prardhan NAuthor Commented:
As per the given IBM link, for AIX default kernel values are suffice, and below are my ulimit values

time(seconds)        unlimited
file(blocks)         unlimited
data(kbytes)         unlimited
stack(kbytes)        unlimited
memory(kbytes)       unlimited
coredump(blocks)     unlimited
nofiles(descriptors) unlimited
threads(per process) unlimited
processes(per user)  unlimited

Will that error message comes if there is network fluctuation?
0
David FavorLinux/LXD/WordPress/Hosting SavantCommented:
This is fairly common when you try using default IPC settings... which will never support even the slightest production load.

https://www-304.ibm.com/support/docview.wss?uid=swg21438228 provides the starting point of how to fix this.

The above link only gives an overview of the problem, then provides other links which actually take you through the process of tuning IPC for your runtime environment.

The OSERR   : EAGAIN (11) "Resource temporarily unavailable" message means IPC queue memory or linked list (management memory) has been exhausted. This is why this evil message is intermittent. As soon as memory becomes available, then IPC will work for a while, then start failing again.

Just go through the IBM IPC Tuning directions + you'll be good.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Prardhan NAuthor Commented:
Thanks!!! will check it out.
0
David FavorLinux/LXD/WordPress/Hosting SavantCommented:
You're welcome!
0
Prardhan NAuthor Commented:
Thanks for your inputs.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Operating Systems

From novice to tech pro — start learning today.