How to clear CLOSE_WAIT state of a TCP connection?

Posted on 2003-03-30
Medium Priority
1 Endorsement
Last Modified: 2013-12-27
When i perform netstat -a, i saw the connections are in CLOSE_WAIT state. This causes my program using these connections to sleep, truss -p <process_pid>. Only after i terminate and restart my program, the connections turn back to ESTABLISHED state.

Is there a timer to set, say after 120 seconds the CLOSE_WAIT connections will break so my program can reconnect again?? For example the "ndd" command??
Question by:SharonLaw
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
  • 2
  • +4

Expert Comment

ID: 8236793

default 240000 (according to RFC 1122, 2MSL), recommended 60000, possibly lower
Since 7: obsoleted parameter, use tcp_time_wait_interval instead
Since 8: no more access, use tcp_time_wait_interval


Author Comment

ID: 8237488
The tcp_time_wait_interval value is 240,000 (4 minutes). But the connections stay at CLOSE_WAIT state even after hours.

Please advise. Thanks.

Expert Comment

ID: 8237930
       CLIENT                         SERVER            

  1.    ESTABLISHED                    ESTABLISHED
  2.    (Close)
        FIN-WAIT-1  --> <FIN,ACK>  --> CLOSE-WAIT
  3.    FIN-WAIT-2  <-- <ACK>      <-- CLOSE-WAIT
  4.                                   (Close)
        TIME-WAIT   <-- <FIN,ACK>  <-- LAST-ACK
  5.    TIME-WAIT   --> <ACK>      --> CLOSED
        (2 MSL)

As I understand it, the tcp_time_wait_interval doesn't kick in until after the CLOSE_WAIT. There is no parameter that directly affects the tcp_close_wait interval. In a scenario where the client sends a close, the server acknowledges this and sends whatever data is still in its buffers. Server in CLOSE_WAIT state. The server will only close the connection once it has sent a FIN to the client and received an ACK for that.

CLOSE_WAIT state means the other end of the connection has been closed while the local end is still waiting for the app to close.

Similarly, if the server receives a SYN + FIN from the client, it does not know what to do and leaves connections stuck in the CLOSE_WAIT state.

It is best to "truss" the application and "snoop" the tcp session to narrow down the problem.

# truss -o truss.out -laef -vall -p <the pid of the server process>
# snoop -o snoop.out port <tcp port number>


Optimize your web performance

What's in the eBook?
- Full list of reasons for poor performance
- Ultimate measures to speed things up
- Primary web monitoring types
- KPIs you should be monitoring in order to increase your ROI


Expert Comment

ID: 8239538
Add the following line
to /etc/init.d/inetinit

/usr/sbin/ndd -set /dev/tcp tcp_close_wait_interval 1500
/usr/sbin/ndd -set /dev/tcp tcp_keepalive_interval 1500

and reboot

LVL 22

Expert Comment

ID: 8240055
Do not set tcp_close_wait_interval, tcp_time_wait_interval, or tcp_keepalive_interval. None of them have anything to do with your problem.

The problem is that your application is not closing the socket now that the other host has closed its socket. That's what CLOSE_WAIT means, namely that the OS is waiting for the application to close the socket.

There are numerous reasons that the application isn't closing the socket, almost all of them are because of application bugs. There are a few ways that the application can be informed that the other end has closed the socket. The most common is to try to read the socket and get back the EOF indicator, which is a successful read of zero bytes.

Writing to the socket may tell you, but not always. In TCP it is allowed to write on a socket that has been closed on the other end, because in TCP closing the socket says that you will do no more writes, but says nothing about whether or not you will still read. If the socket was closed abortively and no longer exists at all, then the write will return an EPIPE error.

If the protocol that you are using does not allow for reading any data, then use non-blocking sockets and the poll call to tell when the socket is readable without actually blocking in the read call.

The only real possibility that is not a bug in the application is if there is a bug in the OS that prevented it from informing the application that the EOF was available. This is of course unlikely, but not impossible.

Author Comment

ID: 8242822
truss -p results:
smsmgr@ws01-1a:admin/bin% psc smmgr
  smsmgr  1581  1563  0   Feb 28 pts/7   10:53 smmgr
smsmgr@ws01-1a:admin/bin% truss -p 1581
lwp_sema_wait(0xFEE0BE78)       (sleeping...)
signotifywait()                 (sleeping...)
lwp_sema_wait(0xFEC07E78)       (sleeping...)
lwp_sema_wait(0xFF12DF08)       (sleeping...)
lwp_sema_wait(0xFEB05E78)       (sleeping...)
lwp_sema_wait(0xFEA03E78)       (sleeping...)
lwp_sema_wait(0xFE901E78)       (sleeping...)
lwp_sema_wait(0xFE40FE78)       (sleeping...)
lwp_sema_wait(0xFE30DE78)       (sleeping...)
lwp_sema_wait(0xFE20BE78)       (sleeping...)
lwp_sema_wait(0xFE109E78)       (sleeping...)
semop(5, 0x00032124, 1)         (sleeping...)
semop(5, 0x00032124, 1)         (sleeping...)
door_return(0x00000000, 0, 0x00000000, 0) (sleeping...)

The program has 10 threads connections. The program sleeps and unable to perform the next read or write actions during the CLOSE_WAIT state.

Expert Comment

ID: 8247598
This behavior is well studied and documented by IBM Websphere (Solaris) Performance team. When high connection rates occur, a large backlog of TCP connections build up and can slow the server down.
It is been witnessed the server stallling during certain peak periods. Netstat showed many sockets opened to port 80 were in CLOSE_WAIT or FIN_WAIT_2. Visible delays of up to 4 minutes in which the server does not send any responses has occurred, but CPU utilization stays high, with all of the activity in system processes.

It is recommended to keep
tcp_close_wait_interval, tcp_time_wait_interval, or tcp_keepalive_interval values to less than or equal to a 1 minute (60000).
A socket remains in CLOSE_WAIT till the server does passive close and sending FIN packet back to client. Due to heavy thread activity the server thread might not get enough CPU cycle to do so. tcp_close_wait_interval suggests that solaris kernel to give up on orphaned close-wait sockets.

Accepted Solution

soupdragon earned 150 total points
ID: 8248943
I don't believe this to be a performance issue. As I understand it the system is not suffering performance degredation, one application is simply not releasing it's connections properly, but does so when killed. This points to an application level problem. The tcp_close_wait interval timer no longer exists - it has been renamed tcp_time_wait_interval precisely because that is what it affects, there is no timer that directly affects close_wait because closing the connection is an application responsibility, not the responsibility of the TCP/IP stack. Once the application has closed and sent a FIN, the TCP/IP stack then goes into the TIME_WAIT loop.

Expert Comment

ID: 8265129
If the application seems to behave ok, and the server isn't heavily loaded, (referring to above comments),
look for droped packets/errors on the network.
Study IP counters with netstat.
Make sure you have NIC and switch ports set to 100mbit/s full duplex and no autoneg(!)
Is there a firewall in the path that "forgets" about sessions after a certain amount of idle time?
Can you see a pattern in when/how frequent this happens? (long periods of idle time, or always after say 15 minutes)



Expert Comment

ID: 10060453
I have faced a similar problem while using iPlanet 4.1 sp9. The connections appear to stay on forever in CLOSE_WAIT state. I tried working with the tcp* parameters to no avail.

You may try sending a HUP signal to the Server process that binds to the port.  I wrote a simple script that would count the number of CLOSE_WAITs on a particular port and if it exceeded 4 (my application would hang at 4 CLOSE_WAITs) it would do a "kill -1 PID-OF-PROCESS". This immediately closes all connections and quicky refreshes the application without any downtime. Let us know if this works for you too.

Featured Post

Optimize your web performance

What's in the eBook?
- Full list of reasons for poor performance
- Ultimate measures to speed things up
- Primary web monitoring types
- KPIs you should be monitoring in order to increase your ROI

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In tuning file systems on the Solaris Operating System, changing some parameters of a file system usually destroys the data on it. For instance, changing the cache segment block size in the volume of a T3 requires that you delete the existing volu…
Introduction Regular patching is part of a system administrator's tasks. However, many patches require that the system be in single-user mode before they can be installed. A cluster patch in particular can take quite a while to apply if the machine…
Learn how to find files with the shell using the find and locate commands. Use locate to find a needle in a haystack.: With locate, check if the file still exists.: Use find to get the actual location of the file.:
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…
Suggested Courses
Course of the Month11 days, 22 hours left to enroll

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question