• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1170
  • Last Modified:

SQL30081N A communication error when running batch jobs against a DB2 database

We are having a batch job called "C". This is basically a J2EE application client which runs third in our batch trail. The first two J2EE aplication client batch jobs, namely are "A" and "B". "A" starts at 3:30 AM followed by "B". Both these jobs run successfully in nearly less than 3 minutes after which our next job "C" kicks off. This is around 3:33 AM. Now, normally "C" takes less than 15 minutes to finish. This was running fine for past 1.5 yrs. But since past two months we are observing a strange behaviour. This job "C" keeps hanging for almost two hrs. and nothing is written to the logs. After about 2 hrs. it errors out around 5:33 AM and error message in the logs is as follows:

COM.ibm.db2.jdbc.DB2Exception: [IBM][CLI Driver] SQL30081N A communication error has been detected. Communication protocol being used: "TCP/IP". Communication API being used: "SOCKETS". Location where the error was detected: "<IP Address>". Communication function detecting the error: "recv". Protocol specific error code(s): "73", "*", "0". SQLSTATE=08001

This is consuming more than 2 hrs time of the other jobs that are supposed to run after "C" and delaying those. So nowadays we monitor "C" for about an hour. If it does not run successfully, we cancel and rerun it. On a second attempt it is always successful!!!.

We have not been able to figure out that why it fails the first time. Any help/pointers will be appreciated. Thanks.
0
kulkaraj
Asked:
kulkaraj
  • 2
3 Solutions
 
ghp7000Commented:
simple solution #1-start batch job C at later interval, say 3:50 and see if it resolves the problem.
simple solution #2-make sure that database is available for batch job C-for example, are you sure that there is no database backup being performed when batch job C tries to connect? are you sure that the network is available for batch job C to connect to the database? (perhaps network goes offline for maintenance, security sweep or other task, check with your network admin to make sure nothing is changing in the network at that time)
simple solution #3-what is the value of database manager parameter MAXAGENTS? According to the docvs, "0" in the error message MAY mean there are not enough agents available for the process to connect.
simple solution #4-check the db2diag.log for clues as to what is taking place

please specify the platform you are using
0
 
sachinwadhwaCommented:
error 73 is for AIX:


The connection might have been closed by the remote gateway or server at the TCP/IP level (eg. firewall problem, power failure, network failure).

OR

Client side connection pooling is enabled and is not handling connection failures. Code the application to retry a connection if a failure is received when connecting to the database and connection pooling is enabled.

OTHERWISE

post text from db2diag.log (from server & client both)
0
 
sachinwadhwaCommented:
one more thing, try following:

Add 'QUERYTIMEOUTINTERVAL=0' on client's side's db2cli.ini file
0
 
theonlyexpertCommented:
well usually you get a comm error after one of the following

1. the instance is down
2. the db is unavailable (backup/restore/etc)
3. a force command had been issued earlier or the instance had been stopped and started
4. not enough agents for the server to take connections.

after job b completes you could try forcing all applications and then running c.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now