• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 11483
  • Last Modified:

Io exception: Broken pipe

I'm not sure whether this is the correct forum to post this question, since the technologies involve Oracle, Java, as well as networking. Please let me know if you think I should post this question elsewhere.

Recently, we started receiving the following error in our log file:

java.sql.SQLException: Io exception: Connection reset by peer: Connection reset by peer
Io exception: Broken pipe

We have been running the application successfully for several weeks with no occurrences of this error. We also have several other applications, identical to the one in question, except for the fact that each of the other instances connect to different remote databases. Because these other instances are running correctly, and this one is not, I was under the impression that it was a problem at the remote database server side. I phoned the administrator over there, and they said a firewall had to be replaced, so I gave them our IP so that we could get through their firewall. However, the error appeared again last night (the application in question runs once every night).

The initial part of the application appears to be working correctly. Essentially, what we have at the remote end is a small PL/SQL program which waits for a trigger to tell it that new data has been inserted. After the program is notified, the data is sent to an Oracle pipe. At the local end, we have a small Java program which checks the pipe to see if any data is available in the Oracle pipe - if so, all the data is retrieved, and then the Java program enters a loop which checks the Oracle pipe for new data during each iteration of the loop, and retrieves any available data. The first part of the Java program appears to retrieve the initial data just fine. However, when the loop is started, the above error is received, and the loop is exited, terminating the Java program.

I still think something is wrong at the remote database end, but cannot pin down the problem. Any assistance in figuring out what is causing the above error message is much appreciated! I searched online for an explanation, and was not able to determine the problem.

I am assigning a high point value to this question since this is an urgent matter.

Thanks.
0
Electrokardiogram
Asked:
Electrokardiogram
  • 6
  • 3
  • 2
  • +2
1 Solution
 
schwertnerCommented:
It depends of the Oracle version, machine brand, OS, JDK version.
0
 
schwertnerCommented:
Try  to ping the Listener to see if it is available:

C:>tnsping alias 7
0
 
geotigerCommented:

Are you connecting to the Oracle database through pooling or dedicated connection? If it is pooling, you may lose the connection during your loop.

You can read it more at http://forum.java.sun.com/thread.jspa?threadID=394408&messageID=1722730
0
Introducing Cloud Class® training courses

Tech changes fast. You can learn faster. That’s why we’re bringing professional training courses to Experts Exchange. With a subscription, you can access all the Cloud Class® courses to expand your education, prep for certifications, and get top-notch instructions.

 
slightwv (䄆 Netminder) Commented:
Not sure if this is the answer or not but since it's a new firewall, see if there is a timeout on that port.  Although we don't use java, I have a specific DB that I get to through a firewall with a 2 hour idle connection timeout.  If I run and long transactions from SQL*Plus, the firewall sees the connection as idle and closes it.
0
 
schwertnerCommented:
See also the profiles of the Oracle user you use for idle time restriction.
0
 
slightwv (䄆 Netminder) Commented:
schwertner,
normally I'd agree with everything you've said if it wasn't for the fact that the app was working fine for weeks and the only thing that is known to have changed is the replacement of a firewall.  I'm betting that there was not a hard and fast configuration on this firewall and when the new one was set up, items were missed.
0
 
ElectrokardiogramAuthor Commented:
I just contacted the network administrator at the remote end, and he says that they are still have connectivity issues at his location. I am not sure what he means by "connectivity issues" - probably some unknown network issues that are causing some connection anomolies. He claims that there are no time out settings enabled on their firewall.

Please remember - our application is able to connect through their network and process available data at the start of our application. After this initial processing, however, we get the errors. I just took a look at our log files again, and the time duration between the beginning of the loop to which I referred (which follows the initial processing) and the error is always under 10 minutes (around 5 minutes is the most common).
0
 
slightwv (䄆 Netminder) Commented:
I would probably wait until they have resolved their "connectivity issues" before you troubleshoot much more.  Has the amount of data processed by this process increased recently?

I'm looking for the 'what changed' parameters.  One of these has to be the cause.  It might be the timeout settings mentioned by  schwertner if the amount of data has crossed over some threshold.
0
 
ElectrokardiogramAuthor Commented:
No, the amount of data processed by this process has not increased recently. According to my log files, the amount of data has been consistently quite low since I entered the related job to our crontab job schedule. The other instances of the application are processing a much higher amount of data.

Unfortunately, I do not have access to the remote network administrator's information - according to him, there are no timeout settings enabled on their firewall. It seems like there may be timeouts, based on my log files, but I can only rely on what the network administrator is telling me.

Besides timeout settings, of what other settings do you think I should be aware?

Thanks for all of your feedback so far.
0
 
slightwv (䄆 Netminder) Commented:
There are SQLNet timeout settings but these should be server wide.  I'm not all that familiar with profiles so I'll have to refer to schwertner or the docs on the idle time restriction he mentioned.

Can you set up some type of test program to help troubleshoot.  If this connection gets dropped, then you've elminiated all app code.

Possible a simple pl/sql loop that displays sysdate.  Connect to the remote DB from a client-side sql*plus session and try something like (untested, i'm typing it in from here):

set serveroutput on size 1000000
begin
for i in 1..10000 loop
    dbms_output.put_line(to_char(sysdate,'MM/DD/YYYY  HH24:MI:SS'));
     dbms_lock.sleep(10);
end loop;
end;
/
0
 
earth man2Commented:
Of course you really need your app to be robust enough to recover from lost connections.  Don't forget pipes lose data when power is lost so you should consider using AQ messaging instead if this data is valuable.
0
 
slightwv (䄆 Netminder) Commented:
even split?
0
 
slightwv (䄆 Netminder) Commented:
Still suggest even split:
schwertner, geotiger, earthman2 and myself
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 6
  • 3
  • 2
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now