Solved

TNS-12547: TNS:lost contact

Posted on 2003-12-11
3
11,836 Views
Last Modified: 2012-06-27
Oracle 9i Rel 2 Version 9.2.0.4 on Windows 2000 Advanced Server.

we continue to experience service interruptions because the connection is lost between Informatica on our application server and our prod database server. There is a firewall in between.  The network people swear there are no drops or lost packets detectible in the network trace.  We installed and are using ethereal on both the ap server and db server and believe that there is some server failure based on errors related to mapping shared drives between them. We can't seem to drill down any further in that direction.

As far as our Oracle problems, I have finally discovered that the errors were captured in the sqlnet.log.

Today, in the app server SQLNET.ora I have set:
TRACE_LEVEL_CLIENT=admin
TRACE_TIMESTAMP_CLIENT
TRACE_UNIQUE_CLIENT
 


Here are the errors previously recorded.
***********************************************************************
Fatal NI connect error 12535, connecting to:
 (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.102.100.212)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=nedssard)(CID=(PROGRAM=D:\Program Files\Informatica\Informatica PowerCenter 6.0 - Server\bin\pmdtm.exe)(HOST=DHNEDSSARAPPROD)(USER=SYSTEM))))

  VERSION INFORMATION:
      TNS for 32-bit Windows: Version 9.2.0.1.0 - Production
      Windows NT TCP/IP NT Protocol Adapter for 32-bit Windows: Version 9.2.0.1.0 - Production
  Time: 18-NOV-2002 11:23:47
  Tracing not turned on.
  Tns error struct:
    nr err code: 0
    ns main err code: 12535
    TNS-12535: TNS:operation timed out
    ns secondary err code: 12560
    nt main err code: 505
    TNS-00505: Operation timed out
    nt secondary err code: 60
    nt OS err code: 0


***********************************************************************
Fatal NI connect error 12547, connecting to:
 (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=164.156.96.227)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=nedssarp)(CID=(PROGRAM=D:\Program Files\Informatica\Informatica PowerCenter 6.0 - Server\bin\pmdtm.exe)(HOST=DHNEDSSARAPPROD)(USER=SYSTEM))))

  VERSION INFORMATION:
      TNS for 32-bit Windows: Version 9.2.0.1.0 - Production
      Windows NT TCP/IP NT Protocol Adapter for 32-bit Windows: Version 9.2.0.1.0 - Production
  Time: 18-NOV-2002 19:09:13
  Tracing not turned on.
  Tns error struct:
    nr err code: 0
    ns main err code: 12547
    TNS-12547: TNS:lost contact
    ns secondary err code: 12560
    nt main err code: 517
    TNS-00517: Lost contact
    nt secondary err code: 54
    nt OS err code: 0

I have found a similiar described in another web site and the solution is listed as:

"Solution: Add "USE_SHARED_SOCKET"="TRUE" to the Windows NT registry on the
server as follows, and restart the database and listener services on the server:"
                   REGEDIT4
                  [HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\HOME0]
                  "USE_SHARED_SOCKET"="TRUE"
                   This page last modified on December 16, 2001.

This solution is a little stale to me and I am wondering if it is valid.

What is causing this error?
What do I do to fix it?
If there isn't enough info here:  What else can I do to expose it further?  I am trying to test my sqlnet.ora changes and have tried connecting from sqlplus on the app server and failing the connection to cause the trc file to be generated but I haven't figured it out yet.

It is causing us a lot of money and trouble.  Because of it, our automated production batch job is not automated, we're having to sit and watch it to make sure it completes successfully every night.
0
Comment
Question by:DonFreeman
3 Comments
 
LVL 1

Expert Comment

by:roberttran
ID: 9925129
Bumped into this before.

check the tcp timeout parameter on your firewall.

hope this helps.
0
 
LVL 1

Author Comment

by:DonFreeman
ID: 10129925
This problem was resolved by Microsoft and our network people finally. An 'emergency' change was made to block the slammer worm and never removed on port 1434.  So, whenever we connected to Oracle using 'random' port 1434 the process broke.

"After looking at the logs and reviewing the comments from the Microsoft Technician, I have found the reason the 1434 traffic was being dropped.  I am not sure that it is what has been causing your TNS timeout issues, but correcting this will at least rule it out as a possible cause.  As I have said, there have been NO firewall drops during the entire time. But now that we have the detailed information that the NetMon logs provides, I had places to look and specific ports (tcp/1434) to look for.  I found an access-list in one of the sets of 6509 switches that was placed there during the Slammer Worm as directed by BCTS security.  As it was placed it was dropping return traffic from the database server that occurred (this one time) on tcp/1434.  This only would have dropped traffic when the App server randomly picked tcp/1434 as it's ephemeral port.  I would be surprised if this caused a application time-out because it was preventing the initiating of a connection, it was not breaking an existing one... but it may have, so I fixed it.
            I have come up with a way to remove the Access-list and replace it with a logging rule in our firewall (so the protection mandated by BCTS is still in place).  This way return traffic will not be dropped (so you will no longer EVER be affected- no matter WHAT ephemeral port you pick) and we will see ANY drops that occur in the firewall logs (unlike the behavior of the access-list in the switches).  I completed all of these changes before noon today, so this will never effect your application again.  I would advise you to ask your Oracle support if a break in the TCP 3-way handshake (like this one) could cause a TNS timeout.  If it can or is likely to, than maybe you found the issue.  That might explain why it was intermittent, but in my opinion it almost doesn't seem random enough.. If you need further explanation of this, please feel free to give me a call and I would be happy to elaborate.  Sorry for any inconvenience, we needed the information in the traces to track it down."
0
 

Accepted Solution

by:
modulo earned 0 total points
ID: 10131702
PAQ'd and points refunded.

modulo

Community Support Moderator
Experts Exchange
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Join & Write a Comment

Truncate is a DDL Command where as Delete is a DML Command. Both will delete data from table, but what is the difference between these below statements truncate table <table_name> ?? delete from <table_name> ?? The first command cannot be …
How to Create User-Defined Aggregates in Oracle Before we begin creating these things, what are user-defined aggregates?  They are a feature introduced in Oracle 9i that allows a developer to create his or her own functions like "SUM", "AVG", and…
This video shows how to copy a database user from one database to another user DBMS_METADATA.  It also shows how to copy a user's permissions and discusses password hash differences between Oracle 10g and 11g.
This video shows how to recover a database from a user managed backup

706 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now