• C

Network, closing socket... timewait.

Ok, I have a tinsy problem.

I am trying to interact with an existing server (I have NO CONTROL over the server)  it is a temporary tcp stream connection, only meant to send and receive a single command (think of it like a web server, but it isn't)

Now, I need to get an update from this server once every second (sometimes even more frequently).

My problem is, this is what is showing up in my netstat...

tcp        0      0 localhost.localdo:34373 localhost.localdo:20000 TIME_WAIT
tcp        0      0 localhost.localdo:34372 localhost.localdo:20000 TIME_WAIT
tcp        0      0 localhost.localdo:34375 localhost.localdo:20000 TIME_WAIT
tcp        0      0 localhost.localdo:34374 localhost.localdo:20000 TIME_WAIT
tcp        0      0 localhost.localdo:34369 localhost.localdo:20000 TIME_WAIT

And this goes on and on and on.

I am closing my end of the connection, and I'm fairly sure the server is closing its end of the connection.  What can I do to prevent this time wait issue?

Here is the meat of the section of code that might be causing the problem.

I am sending the request.  I am then waiting for a response (up to a certain period) (check sock does a select to see if there is data to be read) then I am returning the results.

do {
                Invalid = CheckSocket(sock);
        } while ((Counter<5) && (Invalid == 1));

        if (Invalid)
                sprintf(IBuffer,"Server is Busy, please try again later.\n");
        if (read(sock, IBuffer,( sizeof(unsigned char)*FEEDBACK_BUFFER_LENGTH))
< 0) {
                perror("receiveing on stream socket");

        //      Close this connection, since the stream isn't persistent

Now, my program is operating normally, but those TIME_WAIT's are stacking up.  is there any way to not have those happen, or to reduce the length of time of the wait so it won't stack up so fast?
Who is Participating?
sevaConnect With a Mentor Commented:
I believe this can be done on both.
I think there is no real distinction between the server and the client after a connection is established.

Either one can close the connection, so it will send
RST and avoid TIME_WAIT.
navigator010897Author Commented:
btw - I know the TIME_WAIT state is normal, my problem is that they are piling up too fast.  I have NO control over the server I am querying, so there is no way for me to change it from a temporary stream to a long term stream.

Basically, what I am asking, is there anything I can do on the client end to reduce or remove the TIME_WAIT?  I've tested it with 5 clients hitting the server at the average 1.5 second interval, and the system resources are dwindling fast.
This is what Rich Stevens says about TIME_WAIT:

"There are two reasons for the TIME_WAIT state:
1. to implement TCP's full-duplex connection termination  
2. to allow old duplicate segments to expire in the network
That means, we have to accept going through this state
of a TCP connection. Otherwise, it may not work correctly.

The duration of TIME_WAIT state is implementation-dependent (recommended value is 4 minutes, BSD has 1 minute).
IT Degree with Certifications Included

Aspire to become a network administrator, network security analyst, or computer and information systems manager? Make the most of your experience as an IT professional by earning your B.S. in Network Operations and Security.


2.7.  Please explain the TIME_WAIT state.

  Remember that TCP guarantees all data transmitted will be delivered,
  if at all possible.  When you close a socket, the server goes into a
  TIME_WAIT state, just to be really really sure that all the data has
  gone through.  When a socket is closed, both sides agree by sending
  messages to each other that they will send no more data.  This, it
  seemed to me was good enough, and after the handshaking is done, the
  socket should be closed.  The problem is two-fold.  First, there is no
  way to be sure that the last ack was communicated successfully.
  Second, there may be "wandering duplicates" left on the net that must
  be dealt with if they are delivered.

  Andrew Gierth (andrew@erlenstar.demon.co.uk) helped to explain the
  closing sequence in the following usenet posting:

  Assume that a connection is in ESTABLISHED state, and the client is
  about to do an orderly release. The client's sequence no. is Sc, and
  the server's is Ss. The pipe is empty in both directions.

          Client                                                   Server
          ======                                                   ======
          ESTABLISHED                                              ESTABLISHED
          (client closes)
          ESTABLISHED                                              ESTABLISHED
                       <CTL=FIN+ACK><SEQ=Sc><ACK=Ss> ------->>
                       <<-------- <CTL=ACK><SEQ=Ss><ACK=Sc+1>
          FIN_WAIT_2                                               CLOSE_WAIT
                       <<-------- <CTL=FIN+ACK><SEQ=Ss><ACK=Sc+1>  (server closes)
                       <CTL=ACK>,<SEQ=Sc+1><ACK=Ss+1> ------->>
          TIME_WAIT                                                CLOSED
          (2*msl elapses...)

  Note: the +1 on the sequence numbers is because the FIN counts as one
  byte of data. (The above diagram is equivalent to fig. 13 from RFC

  Now consider what happens if the last of those packets is dropped in
  the network. The client has done with the connection; it has no more
  data or control info to send, and never will have. But the server does
  not know whether the client received all the data correctly; that's
  what the last ACK segment is for. Now the server may or may not care
  whether the client got the data, but that is not an issue for TCP; TCP
  is a reliable rotocol, and must distinguish between an orderly
  connection close where all data is transferred, and a connection abort
  where data may or may not have been lost.

  So, if that last packet is dropped, the server will retransmit it (it
  is, after all, an unacknowledged segment) and will expect to see a
  suitable ACK segment in reply.  If the client went straight to CLOSED,
  the only possible response to that retransmit would be a RST, which
  would indicate to the server that data had been lost, when in fact it
  had not been.

  (Bear in mind that the server's FIN segment may, additionally, contain

  DISCLAIMER: This is my interpretation of the RFCs (I have read all the
  TCP-related ones I could find), but I have not attempted to examine
  implementation source code or trace actual connections in order to
  verify it. I am satisfied that the logic is correct, though.

  More commentarty from Vic:

  The second issue was addressed by Richard Stevens (rstevens@noao.edu,
  author of "Unix Network Programming", see ``1.5 Where can I get source
  code for the book [book  title]?'').  I have put together quotes from
  some of his postings and email which explain this.  I have brought
  together paragraphs from different postings, and have made as few
  changes as possible.

  From Richard Stevens (rstevens@noao.edu):

  If the duration of the TIME_WAIT state were just to handle TCP's full-
  duplex close, then the time would be much smaller, and it would be
  some function of the current RTO (retransmission timeout), not the MSL
  (the packet lifetime).

  A couple of points about the TIME_WAIT state.

  o  The end that sends the first FIN goes into the TIME_WAIT state,
     because that is the end that sends the final ACK.  If the other
     end's FIN is lost, or if the final ACK is lost, having the end that
     sends the first FIN maintain state about the connection guarantees
     that it has enough information to retransmit the final ACK.

  o  Realize that TCP sequence numbers wrap around after 2**32 bytes
     have been transferred.  Assume a connection between A.1500 (host A,
     port 1500) and B.2000.  During the connection one segment is lost
     and retransmitted.  But the segment is not really lost, it is held
     by some intermediate router and then re-injected into the network.
     (This is called a "wandering duplicate".)  But in the time between
     the packet being lost & retransmitted, and then reappearing, the
     connection is closed (without any problems) and then another
     connection is established between the same host, same port (that
     is, A.1500 and B.2000; this is called another "incarnation" of the
     connection).  But the sequence numbers chosen for the new
     incarnation just happen to overlap with the sequence number of the
     wandering duplicate that is about to reappear.  (This is indeed
     possible, given the way sequence numbers are chosen for TCP
     connections.)  Bingo, you are about to deliver the data from the
     wandering duplicate (the previous incarnation of the connection) to
     the new incarnation of the connection.  To avoid this, you do not
     allow the same incarnation of the connection to be reestablished
     until the TIME_WAIT state terminates.

     Even the TIME_WAIT state doesn't complete solve the second problem,
     given what is called TIME_WAIT assassination.  RFC 1337 has more

  o  The reason that the duration of the TIME_WAIT state is 2*MSL is
     that the maximum amount of time a packet can wander around a
     network is assumed to be MSL seconds.  The factor of 2 is for the
     round-trip.  The recommended value for MSL is 120 seconds, but
     Berkeley-derived implementations normally use 30 seconds instead.
     This means a TIME_WAIT delay between 1 and 4 minutes.  Solaris 2.x
     does indeed use the recommended MSL of 120 seconds.

  A wandering duplicate is a packet that appeared to be lost and was
  retransmitted.  But it wasn't really lost ... some router had
  problems, held on to the packet for a while (order of seconds, could
  be a minute if the TTL is large enough) and then re-injects the packet
  back into the network.  But by the time it reappears, the application
  that sent it originally has already retransmitted the data contained
  in that packet.

  Because of these potential problems with TIME_WAIT assassinations, one
  should not avoid the TIME_WAIT state by setting the SO_LINGER option
  to send an RST instead of the normal TCP connection termination
  (FIN/ACK/FIN/ACK).  The TIME_WAIT state is there for a reason; it's
  your friend and it's there to help you :-)

  I have a long discussion of just this topic in my just-released
  "TCP/IP Illustrated, Volume 3".  The TIME_WAIT state is indeed, one of
  the most misunderstood features of TCP.

  I'm currently rewriting "Unix Network Programming" (see ``1.5 Where
  can I get source code for the book [book  title]?''). and will include
  lots more on this topic, as it is often confusing and misunderstood.

  An additional note from Andrew:

  Closing a socket: if SO_LINGER has not been called on a socket, then
  close() is not supposed to discard data. This is true on SVR4.2 (and,
  apparently, on all non-SVR4 systems) but apparently not on SVR4; the
  use of either shutdown() or SO_LINGER seems to be required to
  guarantee delivery of all data.
navigator010897Author Commented:
I had read both of those things on the net, I was just hoping there was a way.  I have found ways to reduce the length of time by recompiling the kernel, but that would only work for me, and isn't  a valid option for everyone.

Personally, if it were me, I'd make it a long term socket.  Right now, you connect, send you request, and get a reply that is no more then about 2040, but most reponses are in the range of 50 bytes.

I am currently begging the author of the server to convert the program to a long term stream or UDP (I don't think delivery guarantee is vital for this app to be honest, but if it is, making the stream long term would be a better way to go.  testing with 5 clients hitting the server for data every 1.5 seconds is piling up >140 TIME_WAITs, which may not be a lot, but this is only 5 clients, with 1 request; by the time it is done, that 1 request could really be 3 to 5 requests in that span of time - each request requiring a new connection to the server piling up the TIME_WAITs even higher, if there were 50 clients making 5 requests, I figure that could pile up to 35000 TIME_WAIT's, and that just seems a waste for what the server is doing)
Well, it seems to me that there is a way (again,
according to Stevens).
If you set SO_LINGER option using the struct linger
  int l_onoff;
  int l_linger;
where l_onoff set to nonzero and l_linger is zero,
"TCP aborts the connection when it is closed.
 That is, TCP discards any data still remaining in the
 socket send buffer and sends an RST to the peer, not
 the normal four-packet connection termination
 sequence... This avoids TCP TIME_WAIT state...
 Some implementations, notably Solaris 2.x where x <= 5,
 do not implement this feature of SO_LINGER"
navigator010897Author Commented:
Do you know if this can be done on the client, or is it a server implementation only, or both?
navigator010897Author Commented:
Works for me, thanks for the added info.  I'm still trying to get the programmer to re-code his server, I think I'm making progress ;)  I'd rather have the server recoded then disable the time_wait...
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.